Building Agentic AI: Workflows, Fine-Tuning, Optimization, and Deployment

2025-12-08 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Sinan Ozdemir (LoopGenius)

AI/ML LLM RAG ai-ml artificial-intelligence-ai data generative-ai

Transform Your Business with Intelligent AI to Drive Outcomes Building reactive AI applications and chatbots is no longer enough. The competitive advantage belongs to those who can build AI that can respond, reason, plan, and execute. Building Agentic AI: Workflows, Fine-Tuning, Optimization, and Deployment takes you beyond basic chatbots to create fully functional, autonomous agents that automate real workflows, enhance human decision-making, and drive measurable business outcomes across high-impact domains like customer support, finance, and research. Whether you're a developer deploying your first model, a data scientist exploring multi-agent systems and distilled LLMs, or a product manager integrating AI workflows and embedding models, this practical handbook provides tried and tested blueprints for building production-ready systems. Harness the power of reasoning models for applications like computer use, multimodal systems to work with all kinds of data, and fine-tuning techniques to get the most out of AI. Learn to test, monitor, and optimize agentic systems to keep them reliable and cost-effective at enterprise scale. Master the complete agentic AI pipeline Design adaptive AI agents with memory, tool use, and collaborative reasoning capabilities Build robust RAG workflows using embeddings, vector databases, and LangGraph state management Implement comprehensive evaluation frameworks beyond accuracy, including precision, recall, and latency metrics Deploy multimodal AI systems that seamlessly integrate text, vision, audio, and code generation Optimize models for production through fine-tuning, quantization, and speculative decoding techniques Navigate the bleeding edge of reasoning LLMs and computer-use capabilities Balance cost, speed, accuracy, and privacy in real-world deployment scenarios Create hybrid architectures that combine multiple agents for complex enterprise applications Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

AWS re:Invent 2025 - Build gpu-boosted, auto-optimized, billion-scale VectorDBs in hours (ANT213)

2025-12-07 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML AWS Cloud Computing GenAI

Amazon OpenSearch Service lets you search billions of vectors in milliseconds and with high accuracy to support semantic search and power generative AI. Learn how we're democratizing vector search and accelerating AI application development with vector index GPU-acceleration and auto-optimization on Amazon OpenSearch Service. These new features allow you to build billion-scale vector database in under an hour, and index vectors 10x faster at only a quarter of the cost, while auto-optimizing for search speed, quality and cost savings.

Learn More: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 -What’s new in search, observability, and vector databases w/ OpenSearch (ANT201)

2025-12-05 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML Analytics AWS Cloud Computing Data Management S3

Discover the latest Amazon OpenSearch Service launches and capabilities that enable and quickly deploy agentic AI applications and vector search operations. Learn how new integrations with Amazon Q enable intelligent data discovery and automated insights, while enhanced Amazon S3 connectivity streamlines data management. This session showcases how our latest vector database optimizations accelerate AI/ML workloads for efficient development of agentic AI, semantic search, and recommendation systems. We'll demonstrate new cost optimization features and performance enhancements across all OpenSearch use cases, including significant updates to Observability. Whether you're building next-generation AI applications or scaling your existing search infrastructure, join us for a comprehensive update on new launches and releases that can transform your search and analytics capabilities.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Qdrant 2025 Conference Interviews

2025-11-28 · DataTalks.Club Listen

podcast_episode

by Evgeniya (Jenny) Sukhodolskaya (Qdrant) , Slava Dubrov (HubSpot) , Andrey Vasnetsov (Qdrant) , Marina Ariamnova (SumUp)

AI/ML Analytics Hubspot LLM SQL

At Qdrant Conference, builders, researchers, and industry practitioners shared how vector search, retrieval infrastructure, and LLM-driven workflows are evolving across developer tooling, AI platforms, analytics teams, and modern search research.

Andrey Vasnetsov (Qdrant) explained how Qdrant was born from the need to combine database-style querying with vector similarity search—something he first built during the COVID lockdowns. He highlighted how vector search has shifted from an ML specialty to a standard developer tool and why hosting an in-person conference matters for gathering honest, real-time feedback from the growing community.

Slava Dubrov (HubSpot) described how his team uses Qdrant to power AI Signals, a platform for embeddings, similarity search, and contextual recommendations that support HubSpot’s AI agents. He shared practical use cases like look-alike company search, reflected on evaluating agentic frameworks, and offered career advice for engineers moving toward technical leadership.

Marina Ariamnova (SumUp) presented her internally built LLM analytics assistant that turns natural-language questions into SQL, executes queries, and returns clean summaries—cutting request times from days to minutes. She discussed balancing analytics and engineering work, learning through real projects, and how LLM tools help analysts scale routine workflows without replacing human expertise.

Evgeniya (Jenny) Sukhodolskaya (Qdrant) discussed the multi-disciplinary nature of DevRel and her focus on retrieval research. She shared her work on sparse neural retrieval, relevance feedback, and hybrid search models that blend lexical precision with semantic understanding—contributing methods like Mini-COIL and shaping Qdrant’s search quality roadmap through end-to-end experimentation and community education.

Speakers

Andrey Vasnetsov Co-founder & CTO of Qdrant, leading the engineering and platform vision behind a developer-focused vector database and vector-native infrastructure. Connect: https://www.linkedin.com/in/andrey-vasnetsov-75268897/

Slava Dubrov Technical Lead at HubSpot working on AI Signals—embedding models, similarity search, and context systems for AI agents. Connect: https://www.linkedin.com/in/slavadubrov/

Marina Ariamnova Data Lead at SumUp, managing analytics and financial data workflows while prototyping LLM tools that automate routine analysis. Connect: https://www.linkedin.com/in/marina-ariamnova/

Evgeniya (Jenny) Sukhodolskaya Developer Relations Engineer at Qdrant specializing in retrieval research, sparse neural methods, and educational ML content. Connect: https://www.linkedin.com/in/evgeniya-sukhodolskaya/

Building a multimodal lakehouse for AI (w/ Chang She)

2025-11-23 · The Analytics Engineering Podcast Listen

podcast_episode

by Tristan Handy (dbt Labs) , Chang She (LanceDB)

AI/ML Analytics Analytics Engineering Data Lake Data Lakehouse dbt Lance Pandas

In this episode, Tristan Handy sits down with Chang She — a co-creator of Pandas and now CEO of LanceDB — to explore the convergence of analytics and AI engineering. The team at LanceDB is rebuilding the data lake from the ground up with AI as a first principle, starting with a new AI-native file format called Lance. Tristan traces Chang's journey as one of the original contributors to the pandas library to building a new infrastructure layer for AI-native data. Learn why vector databases alone aren't enough, why agents require new architecture, and how LanceDB is building a AI lakehouse for the future. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Vector DBs, Knowledge Graphs, Data Fabric, and Why Process Still Rules (w/ Terry Dorsey)

2025-11-21 · Mavens of Data Listen

podcast_episode

by Terry Dorsey (Denodo)

AI/ML Analytics LLM Fabric

In this episode, we're joined by Terry Dorsey, Senior Data Architect & Evangelist at Denodo, to unpack the conceptual differences between terms like data fabrics, vector databases, and knowledge graphs, and remind you not to forget about the importance of structured data in this new AI-native world! What You'll Learn: The difference between data fabrics, vector databases, and knowledge graphs — and the pros and cons Why organizing and managing data is still the hardest part of any AI project (and how process design plays a critical role) Why structured data and schemas are still crucial in the age of LLMs and embeddings How knowledge graphs help model context, relationships, and "episodic memory" more completely than other approaches If you've ever wondered about different data and AI terms, here's a great glossary to check out from Denodo: https://www.denodo.com/en/glossary 🤝 Follow Terry on LinkedIn! Register for free to be part of the next live session: https://bit.ly/3XB3A8b Follow us on Socials: LinkedIn YouTube Instagram (Mavens of Data) Instagram (Maven Analytics) TikTok Facebook Medium X/Twitter

Build powerful, smarter agents with Elasticsearch and Microsoft Foundry

2025-11-19 · Microsoft Ignite 2025

theater

by Rohit Tatachar (Microsoft) , Sunile Manjee (Elastic)

AI/ML Azure ELK Microsoft

Discover how Elasticsearch’s powerful vector database capabilities and the robust Azure AI Foundry Agent Framework combine to power smarter agents. View how to synthesize information from diverse data sources and how to use A2A and MCP for orchestrating complex tasks. Learn how to design and apply intelligent agents for scalable information retrieval, task coordination, and integration across systems.

How to make datamap web-apps of embedding vectors via open source tooling

2025-11-09 · PyData Seattle 2025

talk

by John Tigue

AI/ML API Python RAG

Datamaps are ML-powered visualizations of high-dimensional data, and in this talk the data is collections of embedding vectors. Interactive datamaps run in-browser as web-apps, potentially without any code running on the web server. Datamap tech can be used to visualize, say, the entire collection of chunks in a RAG vector database.

The best-of-breed tools of this new datamap technique are liberally licensed open source. This presentation is an introduction to building with those repos. The maths will be mentioned only in passing; the topic here is simply how-to with specific tools. Talk attendees will be learning about Python tools, which produce high-quality web UIs.

DataMapPlot is the premiere tool for rendering a datamap as a web-app. Here is a live demo thereof: https://connoiter.com/datamap/cff30bc1-0576-44f0-a07c-60456e131b7b

00-25: Intro to datamaps 25-45: Pipeline architecture 45-55: demos touring such tools as UMAP, HDBSCAN, DataMapPlot, Toponomy, etc. 55-90: Group coding

A Google account is required to log in to Google Colab, where participants can run the workshop notebooks. A Hugging Face API key (token) is needed to download Gemma models.

Securing Retrieval-Augmented Generation: How to Defend Vector Databases Against 2025 Threats

2025-11-08 · PyData Seattle 2025 Watch

talk

by Rajesh

LLM RAG Cyber Security

Modern LLM applications rely heavily on embeddings and vector databases for retrieval-augmented generation (RAG). But in 2025, researchers and OWASP flagged vector databases as a new attack surface — from embedding inversion (recovering sensitive training text) to poisoned vectors that hijack prompts. This talk demystifies these threats for practitioners and shows how to secure your RAG pipeline with real-world techniques like encrypted stores, anomaly detection, and retrieval validation. Attendees will leave with a practical security checklist for keeping embeddings safe while still unlocking the power of retrieval.

Architecting Scalable Multi-Modal Video Search

2025-10-01 · PyData Paris 2025 Watch

talk

by Pietro Piccini , Sebastiano Milardo

AI/ML

The exponential growth of video data presents significant challenges for effective content discovery. Traditional keyword search falls short when dealing with visual nuances. This talk addresses the design and implementation of a robust system for large-scale, multi-modal video retrieval, enabling search across petabytes of data using diverse inputs like text descriptions (e.g., appearance, actions) and query images (e.g., faces). We will explore an architecture combining efficient batch preprocessing for feature extraction (including person detection, face/CLIP-style embeddings) with optimized vector database indexing. Attendees will learn about strategies for managing massive datasets, optimizing ML inference pipelines for speed and cost-efficiency (touching upon lightweight models and specialized runtimes), and building interactive systems that bridge pre-computed indexes with real-time analysis capabilities for enhanced insights.

Large-Scale Video Intelligence

2025-09-25 · PyData Amsterdam 2025 Watch

talk

by Antonino Ingargiola , Irene Donato

AI/ML Python

The explosion of video data demands search beyond simple metadata. How do we find specific visual moments, actions, or faces within petabytes of footage? This talk dives into architecting a robust, scalable multi-modal video search system. We will explore an architecture combining efficient batch preprocessing for feature extraction (including person detection, face/CLIP-style embeddings) with optimized vector database indexing. Attendees will learn practical strategies for managing massive datasets, optimizing ML inference (e.g., lightweight models, specialized runtimes), and bridging pre-computed indexes with real-time analysis for deeper insights. This session is for data scientists, ML engineers, and architects looking to build sophisticated video understanding capabilities.

Audience: Data Scientists, Machine Learning Engineers, Data Engineers, System Architects.

Takeaway: Attendees will learn architectural patterns and practical techniques for building scalable multi-modal video search systems, including feature extraction, vector database utilization, and ML pipeline optimization.

Background Knowledge: Familiarity with Python, core machine learning concepts (e.g., embeddings, classification), and general data processing pipelines is beneficial. Experience with video processing or computer vision is a plus but not strictly required.

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

2025-09-02 · PyData Berlin 2025 Watch

talk

by Laura Dumont

AI/ML LLM NLP Python SQL

With a focus on healthcare applications where accuracy is non negotiable, this talk highlights challenges and delivers practical insights on building AI agents which query complex biological and scientific data to answer sophisticated questions. Drawing from our experience developing Owkin-K Navigator, a free-to-use AI co-pilot for biological research, I'll share hard-won lessons about combining natural language processing with SQL querying and vector database retrieval to navigate large biomedical knowledge sources, addressing challenges of preventing hallucinations and ensuring proper source attribution. This session is ideal for data scientists, ML engineers, and anyone interested in applying python and LLM ecosystem to the healthcare domain.

Streamlining Data Pipelines with MCP Servers and Vector Engines

2025-07-15 · Data Engineering Podcast Listen

podcast_episode

by Kacper Łukawski (Qdrant) , Tobias Macey

AI/ML Big Data Data Engineering Data Management Datafold LLM Python RAG

Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process unstructured data. Kacper shares his experience in data engineering, from building big data pipelines in the automotive industry to leveraging large language models (LLMs) for transforming unstructured datasets into valuable assets. He discusses the challenges of building data pipelines for unstructured data and how vector databases facilitate semantic search and retrieval-augmented generation (RAG) applications. Kacper delves into the intricacies of vector storage and search, including metadata and contextual elements, and explores the evolution of vector engines beyond RAG to applications like semantic search and anomaly detection. The conversation covers the role of Model Context Protocol (MCP) servers in simplifying data integration and retrieval processes, highlighting the need for experimentation and evaluation when adopting LLMs, and offering practical advice on optimizing vector search costs and fine-tuning embedding models for improved search quality.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Kacper Łukawski about how MCP servers can be paired with vector databases to streamline processing of unstructured dataInterview IntroductionHow did you get involved in the area of data management?LLMs are enabling the derivation of useful data assets from unstructured sources. What are the challenges that teams face in building the pipelines to support that work?How has the role of vector engines grown or evolved in the past ~2 years as LLMs have gained broader adoption?Beyond its role as a store of context for agents, RAG, etc. what other applications are common for vector databaes?In the ecosystem of vector engines, what are the distinctive elements of Qdrant?How has the MCP specification simplified the work of processing unstructured data?Can you describe the toolchain and workflow involved in building a data pipeline that leverages an MCP for generating embeddings?helping data engineers gain confidence in non-deterministic workflowsbringing application/ML/data teams into collaboration for determining the impact of e.g. chunking strategies, embedding model selection, etc.What are the most interesting, innovative, or unexpected ways that you have seen MCP and Qdrant used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on vector use cases?When is MCP and/or Qdrant the wrong choice?What do you have planned for the future of MCP with Qdrant?Contact Info LinkedInTwitter/XPersonal websiteParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links QdrantKafkaApache OoziNamed Entity RecognitionGraphRAGpgvectorElasticsearchApache LuceneOpenSearchBM25Semantic SearchMCP == Model Context ProtocolAnthropic Contextualized ChunkingCohereThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored by: Cognizant | How Cognizant Helped RJR Transform Market Intelligence with GenAI

AI and the Lakehouse: How Starburst is Pioneering New Workflows

2025-06-11 · Data Engineering Podcast Listen

podcast_episode

by Tobias Macey , Alex Albu (Starburst)

AI/ML Analytics API Dashboard Data Collection Data Contracts Data Engineering Data Lakehouse Data Management Data Quality Datafold Iceberg +6 more

Summary In this episode of the Data Engineering Podcast Alex Albu, tech lead for AI initiatives at Starburst, talks about integrating AI workloads with the lakehouse architecture. From his software engineering roots to leading data engineering efforts, Alex shares insights on enhancing Starburst's platform to support AI applications, including an AI agent for data exploration and using AI for metadata enrichment and workload optimization. He discusses the challenges of integrating AI with data systems, innovations like SQL functions for AI tasks and vector databases, and the limitations of traditional architectures in handling AI workloads. Alex also shares his vision for the future of Starburst, including support for new data formats and AI-driven data exploration tools.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th. This episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial.Your host is Tobias Macey and today I'm interviewing Alex Albu about how Starburst is extending the lakehouse to support AI workloadsInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the interaction points of AI with the types of data workflows that you are supporting with Starburst?What are some of the limitations of warehouse and lakehouse systems when it comes to supporting AI systems?What are the points of friction for engineers who are trying to employ LLMs in the work of maintaining a lakehouse environment?Methods such as tool use (exemplified by MCP) are a means of bolting on AI models to systems like Trino. What are some of the ways that is insufficient or cumbersome?Can you describe the technical implementation of the AI-oriented features that you have incorporated into the Starburst platform?What are the foundational architectural modifications that you had to make to enable those capabilities?For the vector storage and indexing, what modifications did you have to make to iceberg?What was your reasoning for not using a format like Lance?For teams who are using Starburst and your new AI features, what are some examples of the workflows that they can expect?What new capabilities are enabled by virtue of embedding AI features into the interface to the lakehouse?What are the most interesting, innovative, or unexpected ways that you have seen Starburst AI features used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AI features for Starburst?When is Starburst/lakehouse the wrong choice for a given AI use case?What do you have planned for the future of AI on Starburst?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links StarburstPodcast EpisodeAWS AthenaMCP == Model Context ProtocolLLM Tool UseVector EmbeddingsRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeStarburst Data ProductsLanceLanceDBParquetORCpgvectorStarburst IcehouseThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Gen AI Deployment and Monitoring

2025-06-10 · Data + AI Summit 2025

talk

AI/ML Data Lakehouse Databricks GenAI RAG

This course introduces learners to deploying, operationalizing, and monitoring generative artificial intelligence (AI) applications. First, learners will develop knowledge and skills in deploying generative AI applications using tools like Model Serving. Next, the course will discuss operationalizing generative AI applications following modern LLMOps best practices and recommended architectures. Finally, learners will be introduced to the idea of monitoring generative AI applications and their components using Lakehouse Monitoring. Pre-requisites: Familiarity with prompt engineering and retrieval-augmented generation (RAG) techniques, including data preparation, embeddings, vectors, and vector databases. A foundational knowledge of Databricks Data Intelligence Platform tools for evaluation and governance (particularly Unity Catalog). Labs: Yes Certification Path: Databricks Certified Generative AI Engineer Associate

LanceDB: A Complete Search and Analytical Store for Serving Production-scale AI Applications

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Chang She (LanceDB) , Zero Qu (Databricks)

AI/ML Lance

If you're building AI applications, chances are you're solving a retrieval problem somewhere along the way. This is why vector databases are popular today. But if we zoom out from just vector search, serving AI applications also requires handling KV workloads like a traditional feature store, as well as analytical workloads to explore and visualize data. This means that building an AI application often requires multiple data stores, which means multiple data copies, manual syncing, and extra infrastructure expenses. LanceDB is the first and only system that supports all of these workloads in one system. Powered by Lance columnar format, LanceDB completely breaks open the impossible triangle of performance, scalability, and cost for AI serving. Serving AI applications is different from previous waves of technology, and a new paradigm demands new tools.

Optimize Cost and User Value Through Model Routing AI Agent

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Aditya Gautam (Meta)

AI/ML API Data Collection Databricks LLM

Each LLM has unique strengths and weaknesses, and there is no one-size-fits-all solution. Companies strive to balance cost reduction with maximizing the value of their use cases by considering various factors such as latency, multi-modality, API costs, user need, and prompt complexity. Model routing helps in optimizing performance and cost along with enhanced scalability and user satisfaction. Overview of cost-effective models training using AI gateway logs, user feedback, prompt, and model features to design an intelligent model-routing AI agent. Covers different strategies for model routing, deployment in Mosaic AI, re-training, and evaluation through A/B testing and end-to-end Databricks workflows. Additionally, it will delve into the details of training data collection, feature engineering, prompt formatting, custom loss functions, architectural modifications, addressing cold-start problems, query embedding generation and clustering through VectorDB, and RL policy-based exploration.

Gen AI Evaluation and Governance

2025-06-10 · Data + AI Summit 2025

talk

AI/ML Databricks GenAI RAG Cyber Security

This course introduces learners to evaluating and governing GenAI (generative artificial intelligence) systems. First, learners will explore the meaning behind and motivation for building evaluation and governance/security systems. Next, the course will connect evaluation and governance systems to the Databricks Data Intelligence Platform. Third, learners will be introduced to a variety of evaluation techniques for specific components and types of applications. Finally, the course will conclude with an analysis of evaluating entire AI systems with respect to performance and cost. Pre-requisites: Familiarity with prompt engineering, and experience with the Databricks Data Intelligence Platform. Additionally, knowledge of retrieval-augmented generation (RAG) techniques including data preparation, embeddings, vectors, and vector databases Labs: Yes Certification Path: Databricks Certified Generative AI Engineer Associate

Gen AI Application Development

2025-06-09 · Data + AI Summit 2025

talk

AI/ML Databricks GenAI LLM NLP RAG

This course provides participants with information and practical experience in building advanced LLM (Large Language Model) applications using multi-stage reasoning LLM chains and agents. In the initial section, participants will learn how to decompose a problem into its components and select the most suitable model for each step to enhance business use cases. Following this, participants will construct a multi-stage reasoning chain utilizing LangChain and HuggingFace transformers. Finally, participants will be introduced to agents and will design an autonomous agent using generative models on Databricks. Pre-requisites: Solid understanding of natural language processing (NLP) concepts, familiarity with prompt engineering and prompt engineering best practices, experience with the Databricks Data Intelligence Platform, experience with retrieval-augmented generation (RAG) techniques including data preparation, building RAG architectures, and concepts like embeddings, vectors, and vector databases Labs: Yes Certification Path: Databricks Certified Generative AI Engineer Associate

talk-data.com

Vector DB

Activity Trend

Top Events

Top Speakers

Building Agentic AI: Workflows, Fine-Tuning, Optimization, and Deployment

AWS re:Invent 2025 - Build gpu-boosted, auto-optimized, billion-scale VectorDBs in hours (ANT213)

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 -What’s new in search, observability, and vector databases w/ OpenSearch (ANT201)

AWSreInvent #AWSreInvent2025 #AWS

Qdrant 2025 Conference Interviews

Building a multimodal lakehouse for AI (w/ Chang She)

Vector DBs, Knowledge Graphs, Data Fabric, and Why Process Still Rules (w/ Terry Dorsey)

Build powerful, smarter agents with Elasticsearch and Microsoft Foundry

How to make datamap web-apps of embedding vectors via open source tooling

Securing Retrieval-Augmented Generation: How to Defend Vector Databases Against 2025 Threats

Architecting Scalable Multi-Modal Video Search

Large-Scale Video Intelligence

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

Streamlining Data Pipelines with MCP Servers and Vector Engines

Sponsored by: Cognizant | How Cognizant Helped RJR Transform Market Intelligence with GenAI

AI and the Lakehouse: How Starburst is Pioneering New Workflows

Gen AI Deployment and Monitoring

LanceDB: A Complete Search and Analytical Store for Serving Production-scale AI Applications

Optimize Cost and User Value Through Model Routing AI Agent

Gen AI Evaluation and Governance

Gen AI Application Development