talk-data.com
People (2 results)
Activities & events
| Title & Speakers | Event |
|---|---|
|
Berlin Buzzwords 2025 Conference Interviews
2025-09-12 · 17:00
Kacper Łukawski
– guest
@ Qdrant
,
Filip Makraduli
– Founding ML DevRel engineer
@ Superlinked
,
Atita Arora
– guest
,
Brian Goldin
– Founder and CEO
@ Voyager Search
,
André Charton
– Head of Search
@ Kleinanzeigen
,
Manish Gill
– Engineering Manager
@ ClickHouse
At Berlin Buzzwords, industry voices highlighted how search is evolving with AI and LLMs.
Together, their perspectives show a common thread: search is regaining center stage in AI—scaling, hybridization, multimodality, and domain-specific enrichment are shaping the next generation of retrieval systems. Kacper Łukawski Senior Developer Advocate at Qdrant, he educates users on vector and hybrid search. He highlighted Qdrant’s support for dense and sparse vectors, the role of search with LLMs, and his interest in cost-effective models like static embeddings for smaller companies and edge apps. Connect: https://www.linkedin.com/in/kacperlukawski/ Manish Gill André Charton Filip Makraduli Brian Goldin Atita Arora |
DataTalks.Club |
|
Streamlining Data Pipelines with MCP Servers and Vector Engines
2025-07-15 · 02:04
Kacper Łukawski
– guest
@ Qdrant
,
Tobias Macey
– host
Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process unstructured data. Kacper shares his experience in data engineering, from building big data pipelines in the automotive industry to leveraging large language models (LLMs) for transforming unstructured datasets into valuable assets. He discusses the challenges of building data pipelines for unstructured data and how vector databases facilitate semantic search and retrieval-augmented generation (RAG) applications. Kacper delves into the intricacies of vector storage and search, including metadata and contextual elements, and explores the evolution of vector engines beyond RAG to applications like semantic search and anomaly detection. The conversation covers the role of Model Context Protocol (MCP) servers in simplifying data integration and retrieval processes, highlighting the need for experimentation and evaluation when adopting LLMs, and offering practical advice on optimizing vector search costs and fine-tuning embedding models for improved search quality. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Kacper Łukawski about how MCP servers can be paired with vector databases to streamline processing of unstructured dataInterview IntroductionHow did you get involved in the area of data management?LLMs are enabling the derivation of useful data assets from unstructured sources. What are the challenges that teams face in building the pipelines to support that work?How has the role of vector engines grown or evolved in the past ~2 years as LLMs have gained broader adoption?Beyond its role as a store of context for agents, RAG, etc. what other applications are common for vector databaes?In the ecosystem of vector engines, what are the distinctive elements of Qdrant?How has the MCP specification simplified the work of processing unstructured data?Can you describe the toolchain and workflow involved in building a data pipeline that leverages an MCP for generating embeddings?helping data engineers gain confidence in non-deterministic workflowsbringing application/ML/data teams into collaboration for determining the impact of e.g. chunking strategies, embedding model selection, etc.What are the most interesting, innovative, or unexpected ways that you have seen MCP and Qdrant used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on vector use cases?When is MCP and/or Qdrant the wrong choice?What do you have planned for the future of MCP with Qdrant?Contact Info LinkedInTwitter/XPersonal websiteParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links QdrantKafkaApache OoziNamed Entity RecognitionGraphRAGpgvectorElasticsearchApache LuceneOpenSearchBM25Semantic SearchMCP == Model Context ProtocolAnthropic Contextualized ChunkingCohereThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
Data Engineering Podcast |
|
MLOps London March - Talks on ML Inference and Vector Databases
2024-03-12 · 18:00
📽️ This session will be recorded and uploaded to YouTube within 48 hours after it finishes *** MLOps London is back again in March 2024 with talks on production machine learning, Databases, LLMs, DevOps, and Data Science. The plan, not as usual, is to run a virtual-only event this time. AGENDA: ⏱️ 6:00 pm 🎤 How to scale and secure ML inference right alongside your data 🧔🏻 Tobie Morgan Hitchcock -- CEO & Co-Founder of SurrealDB Using traditional ML training and models, learn how the secure and isolated Rust-based SurrealML environment within SurrealDB can help developers and organisations achieve greater efficiency and security with ML inferencing. At the same time, we’ll introduce methods of simplifying machine learning pipelines within organisations, enabling developers to build advanced applications quicker and bring machine learning logic right alongside critical data. ⏱️ 7:00 pm 🎤 Deconstructing Embedding Models 🧔🏻 Kacper Łukawski -- Developer Advocate at Qdrant We will delve deep into the tokenizer's fundamental role, shedding light on its operations and introducing straightforward techniques to assess whether a particular model is suited to your data based solely on its tokenizer. We will explore the significance of the tokenizer in the fine-tuning process of embedding models and discuss strategic approaches to optimize its effectiveness. |
MLOps London March - Talks on ML Inference and Vector Databases
|
|
Kacper Łukawski: The Challenges of Making Vector Search Billion-scale
2023-12-04 · 12:01
Kacper Łukawski
– guest
@ Qdrant
Join Kacper Łukawski as he delves into 'The Challenges of Making Vector Search Billion-scale.' 🔍🌐 Explore the intricacies of semantic search with large-scale embeddings and discover the lessons learned from scaling a vector database at Qdrant. Dive deep into design choices and the robust infrastructure behind them in this enlightening session.💡🚀 #VectorSearch #Scaling #semantics ✨ H I G H L I G H T S ✨ 🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍 Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️ Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear |
DATA MINER Big Data Europe Conference 2020 |
|
July 2023 Computer Vision Meetup (Virtual - EU and Americas)
2023-07-13 · 17:00
Zoom Link https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/ Unleashing the Potential of Visual Data: Vector Databases in Computer Vision Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data. Speaker Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development. Computer Vision Applications at Scale with Vector Databases Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare. Speaker Zain Hasan is a senior developer advocate at Weaviate, an open source vector database. Reverse Image Search for Ecommerce Without Going Crazy Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process! Speaker Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine. Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics! Speaker Jai Chopra is Head of Product at LanceDB How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data. Speaker Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51. |
July 2023 Computer Vision Meetup (Virtual - EU and Americas)
|
|
July 2023 Computer Vision Meetup (Virtual - EU and Americas)
2023-07-13 · 17:00
Zoom Link https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/ Unleashing the Potential of Visual Data: Vector Databases in Computer Vision Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data. Speaker Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development. Computer Vision Applications at Scale with Vector Databases Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare. Speaker Zain Hasan is a senior developer advocate at Weaviate, an open source vector database. Reverse Image Search for Ecommerce Without Going Crazy Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process! Speaker Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine. Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics! Speaker Jai Chopra is Head of Product at LanceDB How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data.. Speaker Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51. |
July 2023 Computer Vision Meetup (Virtual - EU and Americas)
|
|
July 2023 Computer Vision Meetup (Virtual - EU and Americas)
2023-07-13 · 17:00
Zoom Link https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/ Unleashing the Potential of Visual Data: Vector Databases in Computer Vision Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data. Speaker Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development. Computer Vision Applications at Scale with Vector Databases Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare. Speaker Zain Hasan is a senior developer advocate at Weaviate, an open source vector database. Reverse Image Search for Ecommerce Without Going Crazy Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process! Speaker Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine. Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics! Speaker Jai Chopra is Head of Product at LanceDB How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data. Speaker Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51. |
July 2023 Computer Vision Meetup (Virtual - EU and Americas)
|
|
July 2023 Computer Vision Meetup (Virtual - EU and Americas)
2023-07-13 · 17:00
Zoom Link https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/ Unleashing the Potential of Visual Data: Vector Databases in Computer Vision Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data. Speaker Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development. Computer Vision Applications at Scale with Vector Databases Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare. Speaker Zain Hasan is a senior developer advocate at Weaviate, an open source vector database. Reverse Image Search for Ecommerce Without Going Crazy Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process! Speaker Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine. Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics! Speaker Jai Chopra is Head of Product at LanceDB How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data.. Speaker Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51. |
July 2023 Computer Vision Meetup (Virtual - EU and Americas)
|