talk-data.com talk-data.com

Topic

RAG

Retrieval Augmented Generation (RAG)

ai machine_learning llm

74

tagged

Activity Trend

83 peak/qtr
2020-Q1 2026-Q1

Activities

74 activities · Newest first

AWS re:Invent 2025 - Optimize agentic AI apps with semantic caching in Amazon ElastiCache (DAT451)

Multi-agent AI systems now orchestrate complex workflows requiring frequent foundation model calls. In this session, learn how you can reduce latencies to single-digit milliseconds from single-digit seconds with vector search for Amazon ElastiCache for Valkey in agentic AI applications using semantic caching, while also reducing the cost incurred from your foundation models for production workloads. By implementing semantic caching in agentic architectures like RAG-powered assistants and autonomous agents, customers can create performant and cost-effective production-scale agentic AI systems.

Learn More: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

No Cloud? No Problem. Local RAG with Embedding Gemma

Running Retrieval-Augmented Generation (RAG) pipelines often feels tied to expensive cloud APIs or large GPU clusters—but it doesn’t have to be. This session explores how Embedding Gemma, Google’s lightweight open embedding model, enables powerful RAG and text classification workflows entirely on a local machine. Using the Sentence Transformers framework with Hugging Face, high-quality embeddings can be generated efficiently for retrieval and classification tasks. Real-world examples involving call transcripts and agent remark classification illustrate how robust results can be achieved without the cloud—or the budget.

Building Production RAG Systems for Health Care Domains : Clinical Decision

Building on but moving far beyond the single-specialty focus of HandRAG, this session examines how Retrieval-Augmented Generation can be engineered to support clinical reasoning across multiple high stakes surgical areas, including orthopedic, cardiovascular, neurosurgical, and plastic surgery domains. Using a corpus of more than 7,800 clinical publications and cross specialty validation studies, the talk highlights practical methods for structuring heterogeneous medical data, optimizing vector retrieval with up to 35% latency gains, and designing prompts that preserve terminology accuracy across diverse subspecialties. Attendees will also learn a three-tier evaluation framework that improved critical-error detection by 2.4×, as well as deployment strategy such as automated literature refresh pipelines and cost-efficient architectures that reduced inference spending by 60% that enable RAG systems to operate reliably in real production healthcare settings.

The Boringly Simple Loop Powering GenAI Apps

Do you feel lost in the jungle of GenAI frameworks and buzzwords? Here's a way out. Take any GenAI app, peel away the fluff, and look at its core. You'll find the same pattern: a boringly simple nested while loop. I will show you how this loop produces chat assistants, AI agents, and multi-agent systems. Then we'll cover how RAG, tool-calling, and memory are like lego bricks we add as needed. This gives you a first-principles based map. Use it to build GenAI apps from scratch; no frameworks needed.

Keynote by Lisa Amini- What’s Next in AI for Data and Data Management?

Advances in large language models (LLMs) have propelled a recent flurry of AI tools for data management and operations. For example, AI-powered code assistants leverage LLMs to generate code for dataflow pipelines. RAG pipelines enable LLMs to ground responses with relevant information from external data sources. Data agents leverage LLMs to turn natural language questions into data-driven answers and actions. While challenges remain, these advances are opening exciting new opportunities for data scientists and engineers. In this talk, we will examine recent advances, along with some still incubating in research labs, with the goal of understanding where this is all heading, and present our perspective on what’s next for AI in data management and data operations.

Where Have All the Metrics Gone?

How exactly does one validate the factuality of answers from a Retrieval-Augmented Generation (RAG) system? Or measure the impact of the new system prompt for your customer service agent? What do you do when stakeholders keep asking for "accuracy" metrics that you simply don't have? In this talk, we’ll learn how to define (and measure) what “good” looks like when traditional model metrics don’t apply.

Scaling Python to thousands of nodes with Ray

Python is the language of choice for anything to do with AI and ML. While that has made it easy to write code for one machine, it's much more difficult to run workloads across clusters of thousands of nodes. Ray allows you to do just that. I'll demonstrate how to implement this open source tool with a few lines of code. As a demo project, I'll show how I built a RAG for the Wheel of Time series.

AWS re:Invent 2025 - A practitioner’s guide to data for agentic AI (DAT315)

In this session, gain the skills needed to deploy end-to-end agentic AI applications using your most valuable data. This session focuses on data management using processes like Model Context Protocol (MCP) and Retrieval Augmented Generation (RAG), and provides concepts that apply to other methods of customizing agentic AI applications. Discover best practice architectures using AWS database services like Amazon Aurora and OpenSearch Service, along with analytical, data processing and streaming experiences found in SageMaker Unified Studio. Learn data lake, governance, and data quality concepts and how Amazon Bedrock AgentCore and Bedrock Knowledge Bases, and other features tie solution components together.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Autonomous agents powered by streaming data and Retrieval Augmented Generation

Unlock the potential of intelligent autonomous agents that combine real-time streaming data with Retrieval Augmented Generation (RAG) for dynamic decision-making. You will learn how to use streaming technologies like Amazon Kinesis, Amazon MSK, and Managed Service for Apache Flink create a robust pipeline to transform raw events into actionable insights. This session will show you how autonomous agents leverage these real-time insights with RAG architecture powered by OpenSearch, enabling immediate, context-aware responses to changing conditions. This practical architecture drives real-world value in critical scenarios like predictive maintenance, automated incident response, and intelligent customer service automation, with improved accuracy and reduced latency.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Advanced agentic RAG Systems: Deep dive with Amazon Bedrock (AIM425)

Learn to build a production-grade agentic RAG system using Amazon Bedrock Knowledge Bases, Strands, and AgentCore in this expert-level code talk. Through live coding and detailed walkthroughs, learn how to build an intelligent event assistant agent that integrates knowledge retrieval, long-term memory, and user authentication. This hands-on session covers the complete journey from knowledge base setup through agent creation, memory integration (short-term and long-term), runtime deployment, and identity management. Prerequisites: strong experience with Python and familiarity with RAG concepts.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Turn unstructured data in Amazon S3 into AI-ready assets with SageMaker Catalog

Unstructured data often holds untapped value, and Amazon SageMaker makes it possible to turn that data into insights and AI-ready assets. In this session, you'll learn how to bring unstructured data from Amazon S3 into SageMaker, create searchable assets, and build knowledge bases for Amazon Bedrock to improve retrieval-augmented generation (RAG) accuracy. Discover how teams can collaborate across roles, data users can self-serve to find and understand the right data, and governance ensures that the right people get the right access. Bayer will share how they use these capabilities to unlock unstructured data and accelerate research and innovation.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Build agents with knowledge, agentic RAG and Azure AI Search

Start building your next agent with the latest knowledge features from Azure AI Search. In this session, we will demo how to connect your agentic retrieval engine to new knowledge sources like Sharepoint, web and blob. We will also walk through new controls available to improve your RAG performance, across query planning, retrieval and answer generation. Join this code-focused breakout for samples and step-by-step guidance on connecting knowledge to your next agent.

Delivered in a silent stage breakout.

Foundry IQ: the future of RAG with knowledge retrieval and AI Search

Agents need context. How should we connect data to our agents for optimal context? In this session we will introduce Foundry IQ, the knowledge layer for agents, and the latest developments from Azure AI Search and Microsoft Foundry. Learn about multi-source RAG orchestration, retrieval steering, dynamic security controls and agentic RAG.

Build scalable AI apps with Azure SQL Database Hyperscale
breakout
by Ravi Mantena , Anna Hoffman (Azure Data) , Ross Jenkins (Hexagon ALI / Octave) , Aditya Badramraju (Microsoft) , Britt Ewen (BlackRock) , Dmitry Borodin (Hexagon Asset Lifecycle Intelligence)

Build AI apps that run securely and scale with your needs with Azure SQL Database Hyperscale. We’ll cover native vector indexes for semantic search, read scale out for low latency RAG, and secure model invocation with Microsoft Foundry, using the model of your choice, from T SQL. Hear directly from global technology company Hexagon and investment firm BlackRock who will join us to share their experience along with best practices, demos and more!

Securing Retrieval-Augmented Generation: How to Defend Vector Databases Against 2025 Threats

Modern LLM applications rely heavily on embeddings and vector databases for retrieval-augmented generation (RAG). But in 2025, researchers and OWASP flagged vector databases as a new attack surface — from embedding inversion (recovering sensitive training text) to poisoned vectors that hijack prompts. This talk demystifies these threats for practitioners and shows how to secure your RAG pipeline with real-world techniques like encrypted stores, anomaly detection, and retrieval validation. Attendees will leave with a practical security checklist for keeping embeddings safe while still unlocking the power of retrieval.

Evaluation is all you need

LLM apps fail without reliable, reproducible evaluation. This talk maps the open‑source evaluation landscape, compares leading techniques (RAGAS, Evaluation Driven Development) and frameworks (DeepEval, Phoenix, LangFuse, and braintrust), and shows how to combine tests, RAG‑specific evals, and observability to ship higher‑quality systems. Attendees leave with a decision checklist, code patterns, and a production‑ready playbook.

Real-TIme Context Engineering for Agents

Agents need timely and relevant context data to work effectively in an interactive environment. If an agent takes more than a few seconds to react to an action in a client applicatoin, users will not perceive it as intelligent - just laggy.

Real-time context engineering involves building real-time data pipelines to pre-process application data and serve relevant and timely context to agents. This talk will focus on how you can leverage application identifiers (user ID, session ID, article ID, order ID, etc) to identify which real-time context data to provide to agents. We will contrast this approach with the more traditional RAG approach of using vector indexes to retrieve chunks of relevent text using the user query. Our approach will necessitate the introduction of the Agent-to-Agent protocol, an emerging standard for defining APIs for agents.

We will also demonstrate how we provide real-time context data from applications inside Python agents using the Hopsworks feature store. We will walk through an example of an interactive application (TikTok clone).

Searching for My Next Chart

Abstract

As a data visualization practitioner, I frequently draw inspiration from the diverse and rapidly expanding community, particularly through challenges like #TidyTuesday. However, the sheer volume of remarkable visualizations quickly overwhelmed my manual curation methods—from Pinterest boards to Notion pages. This created a significant bottleneck in my workflow, as I found myself spending more time cataloging charts than actively creating them.

In this talk, I will present a RAG (Retrieval Augmented Generation) based retrieval system that I designed specifically for data visualizations. I will detail the methodology behind this system, illustrating how I addressed my own workflow inefficiencies by transforming a dispersed collection of charts into a semantically searchable knowledge base. This project serves as a practical example of applying advanced AI techniques to enhance creative technical work, demonstrating how a specialized retrieval system can significantly improve the efficiency and quality of data visualization creation process.

Building an AI Agent for Natural Language to SQL Query Execution on Live Databases

This hands-on tutorial will guide participants through building an end-to-end AI agent that translates natural language questions into SQL queries, validates and executes them on live databases, and returns accurate responses. Participants will build a system that intelligently routes between a specialized SQL agent and a ReAct chat agent, implementing RAG for query similarity matching, comprehensive safety validation, and human-in-the-loop confirmation. By the end of this session, attendees will have created a powerful and extensible system they can adapt to their own data sources.

One API to Rule Them All? LiteLLM in Production

Using LiteLLM in a Real-World RAG System: What Worked and What Didn’t

LiteLLM provides a unified interface to work with multiple LLM providers—but how well does it hold up in practice? In this talk, I’ll share how we used LiteLLM in a production system to simplify model access and handle token budgets. I’ll outline the benefits, the hidden trade-offs, and the situations where the abstraction helped—or got in the way. This is a practical, developer-focused session on integrating LiteLLM into real workflows, including lessons learned and limitations. If you’re considering LiteLLM, this talk offers a grounded look at using it beyond simple prototypes.