Search – talk-data.com

Title & Speakers	Event
Berlin Buzzwords 2025 Conference Interviews 2025-09-12 · 17:00 Kacper Łukawski – guest @ Qdrant , Filip Makraduli – Founding ML DevRel engineer @ Superlinked , Atita Arora – guest , Brian Goldin – Founder and CEO @ Voyager Search , André Charton – Head of Search @ Kleinanzeigen , Manish Gill – Engineering Manager @ ClickHouse At Berlin Buzzwords, industry voices highlighted how search is evolving with AI and LLMs. Kacper Łukawski (Qdrant) stressed hybrid search (semantic + keyword) as core for RAG systems and promoted efficient embedding models for smaller-scale use. Manish Gill (ClickHouse) discussed auto-scaling OLAP databases on Kubernetes, combining infrastructure and database knowledge. André Charton (Kleinanzeigen) reflected on scaling search for millions of classifieds, moving from Solr/Elasticsearch toward vector search, while returning to a hands-on technical role. Filip Makraduli (Superlinked) introduced a vector-first framework that fuses multiple encoders into one representation for nuanced e-commerce and recommendation search. Brian Goldin (Voyager Search) emphasized spatial context in retrieval, combining geospatial data with AI enrichment to add the “where” to search. Atita Arora (Voyager Search) highlighted geospatial AI models, the renewed importance of retrieval in RAG, and the cautious but promising rise of AI agents. Together, their perspectives show a common thread: search is regaining center stage in AI—scaling, hybridization, multimodality, and domain-specific enrichment are shaping the next generation of retrieval systems. Kacper Łukawski Senior Developer Advocate at Qdrant, he educates users on vector and hybrid search. He highlighted Qdrant’s support for dense and sparse vectors, the role of search with LLMs, and his interest in cost-effective models like static embeddings for smaller companies and edge apps. Connect: https://www.linkedin.com/in/kacperlukawski/ Manish Gill Engineering Manager at ClickHouse, he spoke about running ClickHouse on Kubernetes, tackling auto-scaling and stateful sets. His team focuses on making ClickHouse scale automatically in the cloud. He credited its speed to careful engineering and reflected on the shift from IC to manager. Connect: https://www.linkedin.com/in/manishgill/ André Charton Head of Search at Kleinanzeigen, he discussed shaping the company’s search tech—moving from Solr to Elasticsearch and now vector search with Vespa. Kleinanzeigen handles 60M items, 1M new listings daily, and 50k requests/sec. André explained his career shift back to hands-on engineering. Connect: https://www.linkedin.com/in/andrecharton/ Filip Makraduli Founding ML DevRel engineer at Superlinked, an open-source framework for AI search and recommendations. Its vector-first approach fuses multiple encoders (text, images, structured fields) into composite vectors for single-shot retrieval. His Berlin Buzzwords demo showed e-commerce search with natural-language queries and filters. Connect: https://www.linkedin.com/in/filipmakraduli/ Brian Goldin Founder and CEO of Voyager Search, which began with geospatial search and expanded into documents and metadata enrichment. Voyager indexes spatial data and enriches pipelines with NLP, OCR, and AI models to detect entities like oil spills or windmills. He stressed adding spatial context (“the where”) as critical for search and highlighted Voyager’s 12 years of enterprise experience. Connect: https://www.linkedin.com/in/brian-goldin-04170a1/ Atita Arora Director of AI at Voyager Search, with nearly 20 years in retrieval systems, now focused on geospatial AI for Earth observation data. At Berlin Buzzwords she hosted sessions, attended talks on Lucene, GPUs, and Solr, and emphasized retrieval quality in RAG systems. She is cautiously optimistic about AI agents and values the event as both learning hub and professional reunion. Connect: https://www.linkedin.com/in/atitaarora/ AI/ML ClickHouse Cloud Computing ELK Kubernetes LLM NLP RAG	DataTalks.Club Listen
Streamlining Data Pipelines with MCP Servers and Vector Engines 2025-07-15 · 02:04 Kacper Łukawski – guest @ Qdrant , Tobias Macey – host Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process unstructured data. Kacper shares his experience in data engineering, from building big data pipelines in the automotive industry to leveraging large language models (LLMs) for transforming unstructured datasets into valuable assets. He discusses the challenges of building data pipelines for unstructured data and how vector databases facilitate semantic search and retrieval-augmented generation (RAG) applications. Kacper delves into the intricacies of vector storage and search, including metadata and contextual elements, and explores the evolution of vector engines beyond RAG to applications like semantic search and anomaly detection. The conversation covers the role of Model Context Protocol (MCP) servers in simplifying data integration and retrieval processes, highlighting the need for experimentation and evaluation when adopting LLMs, and offering practical advice on optimizing vector search costs and fine-tuning embedding models for improved search quality. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Kacper Łukawski about how MCP servers can be paired with vector databases to streamline processing of unstructured dataInterview IntroductionHow did you get involved in the area of data management?LLMs are enabling the derivation of useful data assets from unstructured sources. What are the challenges that teams face in building the pipelines to support that work?How has the role of vector engines grown or evolved in the past ~2 years as LLMs have gained broader adoption?Beyond its role as a store of context for agents, RAG, etc. what other applications are common for vector databaes?In the ecosystem of vector engines, what are the distinctive elements of Qdrant?How has the MCP specification simplified the work of processing unstructured data?Can you describe the toolchain and workflow involved in building a data pipeline that leverages an MCP for generating embeddings?helping data engineers gain confidence in non-deterministic workflowsbringing application/ML/data teams into collaboration for determining the impact of e.g. chunking strategies, embedding model selection, etc.What are the most interesting, innovative, or unexpected ways that you have seen MCP and Qdrant used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on vector use cases?When is MCP and/or Qdrant the wrong choice?What do you have planned for the future of MCP with Qdrant?Contact Info LinkedInTwitter/XPersonal websiteParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links QdrantKafkaApache OoziNamed Entity RecognitionGraphRAGpgvectorElasticsearchApache LuceneOpenSearchBM25Semantic SearchMCP == Model Context ProtocolAnthropic Contextualized ChunkingCohereThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA AI/ML Big Data Data Engineering Data Management Datafold LLM Python RAG Vector DB	Data Engineering Podcast Listen
MLOps London March - Talks on ML Inference and Vector Databases 2024-03-12 · 18:00 📽️ This session will be recorded and uploaded to YouTube within 48 hours after it finishes * MLOps London is back again in March 2024 with talks on production machine learning, Databases, LLMs, DevOps, and Data Science. The plan, not as usual, is to run a virtual-only event this time. AGENDA: ⏱️ 6:00 pm 🎤 How to scale and secure ML inference right alongside your data 🧔🏻 Tobie Morgan Hitchcock -- CEO & Co-Founder of SurrealDB Using traditional ML training and models, learn how the secure and isolated Rust-based SurrealML environment within SurrealDB can help developers and organisations achieve greater efficiency and security with ML inferencing. At the same time, we’ll introduce methods of simplifying machine learning pipelines within organisations, enabling developers to build advanced applications quicker and bring machine learning logic right alongside critical data. ⏱️ 7:00 pm 🎤 Deconstructing Embedding Models** 🧔🏻 Kacper Łukawski -- Developer Advocate at Qdrant We will delve deep into the tokenizer's fundamental role, shedding light on its operations and introducing straightforward techniques to assess whether a particular model is suited to your data based solely on its tokenizer. We will explore the significance of the tokenizer in the fine-tuning process of embedding models and discuss strategic approaches to optimize its effectiveness.	MLOps London March - Talks on ML Inference and Vector Databases
Kacper Łukawski: The Challenges of Making Vector Search Billion-scale 2023-12-04 · 12:01 Kacper Łukawski – guest @ Qdrant Join Kacper Łukawski as he delves into 'The Challenges of Making Vector Search Billion-scale.' 🔍🌐 Explore the intricacies of semantic search with large-scale embeddings and discover the lessons learned from scaling a vector database at Qdrant. Dive deep into design choices and the robust infrastructure behind them in this enlightening session.💡🚀 #VectorSearch #Scaling #semantics ✨ H I G H L I G H T S ✨ 🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍 Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️ Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear Big Data Vector DB	DATA MINER Big Data Europe Conference 2020 YouTube
July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00 Zoom Link https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/ Unleashing the Potential of Visual Data: Vector Databases in Computer Vision Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data. Speaker Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development. Computer Vision Applications at Scale with Vector Databases Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare. Speaker Zain Hasan is a senior developer advocate at Weaviate, an open source vector database. Reverse Image Search for Ecommerce Without Going Crazy Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process! Speaker Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine. Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics! Speaker Jai Chopra is Head of Product at LanceDB How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data. Speaker Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.	July 2023 Computer Vision Meetup (Virtual - EU and Americas)
July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00 Zoom Link https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/ Unleashing the Potential of Visual Data: Vector Databases in Computer Vision Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data. Speaker Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development. Computer Vision Applications at Scale with Vector Databases Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare. Speaker Zain Hasan is a senior developer advocate at Weaviate, an open source vector database. Reverse Image Search for Ecommerce Without Going Crazy Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process! Speaker Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine. Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics! Speaker Jai Chopra is Head of Product at LanceDB How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data.. Speaker Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.	July 2023 Computer Vision Meetup (Virtual - EU and Americas)
July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00 Zoom Link https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/ Unleashing the Potential of Visual Data: Vector Databases in Computer Vision Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data. Speaker Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development. Computer Vision Applications at Scale with Vector Databases Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare. Speaker Zain Hasan is a senior developer advocate at Weaviate, an open source vector database. Reverse Image Search for Ecommerce Without Going Crazy Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process! Speaker Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine. Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics! Speaker Jai Chopra is Head of Product at LanceDB How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data. Speaker Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.	July 2023 Computer Vision Meetup (Virtual - EU and Americas)
July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00 Zoom Link https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/ Unleashing the Potential of Visual Data: Vector Databases in Computer Vision Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data. Speaker Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development. Computer Vision Applications at Scale with Vector Databases Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare. Speaker Zain Hasan is a senior developer advocate at Weaviate, an open source vector database. Reverse Image Search for Ecommerce Without Going Crazy Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process! Speaker Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine. Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics! Speaker Jai Chopra is Head of Product at LanceDB How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data.. Speaker Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.	July 2023 Computer Vision Meetup (Virtual - EU and Americas)

Berlin Buzzwords 2025 Conference Interviews 2025-09-12 · 17:00

Kacper Łukawski – guest @ Qdrant , Filip Makraduli – Founding ML DevRel engineer @ Superlinked , Atita Arora – guest , Brian Goldin – Founder and CEO @ Voyager Search , André Charton – Head of Search @ Kleinanzeigen , Manish Gill – Engineering Manager @ ClickHouse

At Berlin Buzzwords, industry voices highlighted how search is evolving with AI and LLMs.

Kacper Łukawski (Qdrant) stressed hybrid search (semantic + keyword) as core for RAG systems and promoted efficient embedding models for smaller-scale use.
Manish Gill (ClickHouse) discussed auto-scaling OLAP databases on Kubernetes, combining infrastructure and database knowledge.
André Charton (Kleinanzeigen) reflected on scaling search for millions of classifieds, moving from Solr/Elasticsearch toward vector search, while returning to a hands-on technical role.
Filip Makraduli (Superlinked) introduced a vector-first framework that fuses multiple encoders into one representation for nuanced e-commerce and recommendation search.
Brian Goldin (Voyager Search) emphasized spatial context in retrieval, combining geospatial data with AI enrichment to add the “where” to search.
Atita Arora (Voyager Search) highlighted geospatial AI models, the renewed importance of retrieval in RAG, and the cautious but promising rise of AI agents.

Together, their perspectives show a common thread: search is regaining center stage in AI—scaling, hybridization, multimodality, and domain-specific enrichment are shaping the next generation of retrieval systems.

Kacper Łukawski Senior Developer Advocate at Qdrant, he educates users on vector and hybrid search. He highlighted Qdrant’s support for dense and sparse vectors, the role of search with LLMs, and his interest in cost-effective models like static embeddings for smaller companies and edge apps. Connect: https://www.linkedin.com/in/kacperlukawski/

Manish Gill
Engineering Manager at ClickHouse, he spoke about running ClickHouse on Kubernetes, tackling auto-scaling and stateful sets. His team focuses on making ClickHouse scale automatically in the cloud. He credited its speed to careful engineering and reflected on the shift from IC to manager.
Connect: https://www.linkedin.com/in/manishgill/

André Charton
Head of Search at Kleinanzeigen, he discussed shaping the company’s search tech—moving from Solr to Elasticsearch and now vector search with Vespa. Kleinanzeigen handles 60M items, 1M new listings daily, and 50k requests/sec. André explained his career shift back to hands-on engineering.
Connect: https://www.linkedin.com/in/andrecharton/

Filip Makraduli
Founding ML DevRel engineer at Superlinked, an open-source framework for AI search and recommendations. Its vector-first approach fuses multiple encoders (text, images, structured fields) into composite vectors for single-shot retrieval. His Berlin Buzzwords demo showed e-commerce search with natural-language queries and filters.
Connect: https://www.linkedin.com/in/filipmakraduli/

Brian Goldin
Founder and CEO of Voyager Search, which began with geospatial search and expanded into documents and metadata enrichment. Voyager indexes spatial data and enriches pipelines with NLP, OCR, and AI models to detect entities like oil spills or windmills. He stressed adding spatial context (“the where”) as critical for search and highlighted Voyager’s 12 years of enterprise experience.
Connect: https://www.linkedin.com/in/brian-goldin-04170a1/

Atita Arora
Director of AI at Voyager Search, with nearly 20 years in retrieval systems, now focused on geospatial AI for Earth observation data. At Berlin Buzzwords she hosted sessions, attended talks on Lucene, GPUs, and Solr, and emphasized retrieval quality in RAG systems. She is cautiously optimistic about AI agents and values the event as both learning hub and professional reunion.
Connect: https://www.linkedin.com/in/atitaarora/

AI/ML ClickHouse Cloud Computing ELK Kubernetes LLM NLP RAG

DataTalks.Club

Listen

Streamlining Data Pipelines with MCP Servers and Vector Engines 2025-07-15 · 02:04

Kacper Łukawski – guest @ Qdrant , Tobias Macey – host

Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process unstructured data. Kacper shares his experience in data engineering, from building big data pipelines in the automotive industry to leveraging large language models (LLMs) for transforming unstructured datasets into valuable assets. He discusses the challenges of building data pipelines for unstructured data and how vector databases facilitate semantic search and retrieval-augmented generation (RAG) applications. Kacper delves into the intricacies of vector storage and search, including metadata and contextual elements, and explores the evolution of vector engines beyond RAG to applications like semantic search and anomaly detection. The conversation covers the role of Model Context Protocol (MCP) servers in simplifying data integration and retrieval processes, highlighting the need for experimentation and evaluation when adopting LLMs, and offering practical advice on optimizing vector search costs and fine-tuning embedding models for improved search quality.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Kacper Łukawski about how MCP servers can be paired with vector databases to streamline processing of unstructured dataInterview IntroductionHow did you get involved in the area of data management?LLMs are enabling the derivation of useful data assets from unstructured sources. What are the challenges that teams face in building the pipelines to support that work?How has the role of vector engines grown or evolved in the past ~2 years as LLMs have gained broader adoption?Beyond its role as a store of context for agents, RAG, etc. what other applications are common for vector databaes?In the ecosystem of vector engines, what are the distinctive elements of Qdrant?How has the MCP specification simplified the work of processing unstructured data?Can you describe the toolchain and workflow involved in building a data pipeline that leverages an MCP for generating embeddings?helping data engineers gain confidence in non-deterministic workflowsbringing application/ML/data teams into collaboration for determining the impact of e.g. chunking strategies, embedding model selection, etc.What are the most interesting, innovative, or unexpected ways that you have seen MCP and Qdrant used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on vector use cases?When is MCP and/or Qdrant the wrong choice?What do you have planned for the future of MCP with Qdrant?Contact Info LinkedInTwitter/XPersonal websiteParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links QdrantKafkaApache OoziNamed Entity RecognitionGraphRAGpgvectorElasticsearchApache LuceneOpenSearchBM25Semantic SearchMCP == Model Context ProtocolAnthropic Contextualized ChunkingCohereThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Big Data Data Engineering Data Management Datafold LLM Python RAG Vector DB

Data Engineering Podcast

Listen

MLOps London March - Talks on ML Inference and Vector Databases 2024-03-12 · 18:00

📽️ This session will be recorded and uploaded to YouTube within 48 hours after it finishes *** MLOps London is back again in March 2024 with talks on production machine learning, Databases, LLMs, DevOps, and Data Science. The plan, not as usual, is to run a virtual-only event this time.

AGENDA: ⏱️ 6:00 pm 🎤 How to scale and secure ML inference right alongside your data 🧔🏻 Tobie Morgan Hitchcock -- CEO & Co-Founder of SurrealDB

Using traditional ML training and models, learn how the secure and isolated Rust-based SurrealML environment within SurrealDB can help developers and organisations achieve greater efficiency and security with ML inferencing. At the same time, we’ll introduce methods of simplifying machine learning pipelines within organisations, enabling developers to build advanced applications quicker and bring machine learning logic right alongside critical data.

⏱️ 7:00 pm 🎤 Deconstructing Embedding Models 🧔🏻 Kacper Łukawski -- Developer Advocate at Qdrant

We will delve deep into the tokenizer's fundamental role, shedding light on its operations and introducing straightforward techniques to assess whether a particular model is suited to your data based solely on its tokenizer. We will explore the significance of the tokenizer in the fine-tuning process of embedding models and discuss strategic approaches to optimize its effectiveness.

MLOps London March - Talks on ML Inference and Vector Databases

Kacper Łukawski: The Challenges of Making Vector Search Billion-scale 2023-12-04 · 12:01

Kacper Łukawski – guest @ Qdrant

Join Kacper Łukawski as he delves into 'The Challenges of Making Vector Search Billion-scale.' 🔍🌐 Explore the intricacies of semantic search with large-scale embeddings and discover the lessons learned from scaling a vector database at Qdrant. Dive deep into design choices and the robust infrastructure behind them in this enlightening session.💡🚀 #VectorSearch #Scaling #semantics

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Big Data Vector DB

DATA MINER Big Data Europe Conference 2020

YouTube

July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00

Zoom Link

https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/

Unleashing the Potential of Visual Data: Vector Databases in Computer Vision

Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data.

Speaker

Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development.

Computer Vision Applications at Scale with Vector Databases

Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare.

Speaker

Zain Hasan is a senior developer advocate at Weaviate, an open source vector database.

Reverse Image Search for Ecommerce Without Going Crazy

Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process!

Speaker

Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine.

Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale

Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics!

Speaker

Jai Chopra is Head of Product at LanceDB

How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant

Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data.

Speaker

Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.

July 2023 Computer Vision Meetup (Virtual - EU and Americas)

July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00

Zoom Link

https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/

Unleashing the Potential of Visual Data: Vector Databases in Computer Vision

Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data.

Speaker

Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development.

Computer Vision Applications at Scale with Vector Databases

Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare.

Speaker

Zain Hasan is a senior developer advocate at Weaviate, an open source vector database.

Reverse Image Search for Ecommerce Without Going Crazy

Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process!

Speaker

Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine.

Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale

Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics!

Speaker

Jai Chopra is Head of Product at LanceDB

How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant

Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data..

Speaker

Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.

July 2023 Computer Vision Meetup (Virtual - EU and Americas)

July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00

Zoom Link

https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/

Unleashing the Potential of Visual Data: Vector Databases in Computer Vision

Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data.

Speaker

Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development.

Computer Vision Applications at Scale with Vector Databases

Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare.

Speaker

Zain Hasan is a senior developer advocate at Weaviate, an open source vector database.

Reverse Image Search for Ecommerce Without Going Crazy

Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process!

Speaker

Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine.

Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale

Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics!

Speaker

Jai Chopra is Head of Product at LanceDB

How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant

Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data.

Speaker

Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.

July 2023 Computer Vision Meetup (Virtual - EU and Americas)

July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00

Zoom Link

https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/

Unleashing the Potential of Visual Data: Vector Databases in Computer Vision

Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data.

Speaker

Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development.

Computer Vision Applications at Scale with Vector Databases

Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare.

Speaker

Zain Hasan is a senior developer advocate at Weaviate, an open source vector database.

Reverse Image Search for Ecommerce Without Going Crazy

Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process!

Speaker

Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine.

Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale

Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics!

Speaker

Jai Chopra is Head of Product at LanceDB

How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant

Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data..

Speaker

Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.

July 2023 Computer Vision Meetup (Virtual - EU and Americas)

talk-data.com

People (2 results)

Activities & events