talk-data.com talk-data.com

Topic

RAG

Retrieval Augmented Generation (RAG)

ai machine_learning llm

369

tagged

Activity Trend

83 peak/qtr
2020-Q1 2026-Q1

Activities

369 activities · Newest first

Context Engineering with DSPy

AI agents need the right context at the right time to do a good job. Too much input increases cost and harms accuracy, while too little causes instability and hallucinations. Context Engineering with DSPy introduces a practical, evaluation-driven way to design AI systems that remain reliable, predictable, and easy to maintain as they grow. AI engineer and educator Mike Taylor explains DSPy in a clear, approachable style, showing how its modular structure, portable programs, and built-in optimizers help teams move beyond guesswork. Through real examples and step-by-step guidance, you'll learn how DSPy's signatures, modules, datasets, and metrics work together to solve context engineering problems that evolve as models change and workloads scale. This book supports AI engineers, data scientists, machine learning practitioners, and software developers building AI agents, retrieval-augmented generation (RAG) systems, and multistep reasoning workflows that hold up in production. Understand the core ideas behind context engineering and why they matter Structure LLM pipelines with DSPy's maintainable, reusable components Apply evaluation-driven optimizers like GEPA and MIPROv2 for measurable improvements Create reproducible RAG and agentic workflows with clear metrics Develop AI systems that stay robust across providers, model updates, and real-world constraints

Generative AI on Microsoft Azure

Companies are now moving generative AI projects from the lab to production environments. To support these increasingly sophisticated applications, they're turning to advanced practices such as multiagent architectures and complex code-based frameworks. This practical handbook shows you how to leverage cutting-edge techniques using Microsoft's powerful ecosystem of tools to deploy trustworthy AI systems tailored to your organization's needs. Written for and by AI professionals, Generative AI on Microsoft Azure goes beyond the technical core aspects, examining underlying principles, tools, and practices in depth, from the art of prompt engineering to strategies for fine-tuning models to advanced techniques like retrieval-augmented generation (RAG) and agentic AI. Through real-world case studies and insights from top experts, you'll learn how to harness AI's full potential on Azure, paving the way for groundbreaking solutions and sustainable success in today's AI-driven landscape. Understand the technical foundations of generative AI and how the technology has evolved over the last few years Implement advanced GenAI applications using Microsoft services like Azure AI Foundry, Copilot, GitHub Models, Azure Databricks, and Snowflake on Azure Leverage patterns, tools, frameworks, and platforms to customize AI projects Manage, govern, and secure your AI-enabled systems with responsible AI practices Build upon expert guidance to avoid common pitfalls, future-proof your applications, and more

Organizations now have more options to build effective RAG systems, and those options come with confusion. Many organizations are looking to capitalize on new innovations such as long context windows, knowledge graphs, reasoning models, multi agent systems, and beyond. Attend this session to learn about seven challenges with RAG systems, their associated architectural choices, and best practices to improve their performance.

RAG has emerged as a powerful approach for building advanced AI systems that combine the strengths of large language models with external knowledge sources. However, RAG solutions struggle with reliability and require a lot of experimentation. This session will address key questions to help determine the best design pattern and optimization for RAG implementations.

GenAI solutions include several choices and trade-offs. A critical decision is: should you build custom AI solutions in-house or buy off-the-shelf products? This session brings together a debate on the trade-offs, risk and rewards of each approach. The session will be based on scenarios and use-cases to highlight key considerations such as cost, reliability , flexibility and speed for different decisions such as LLMs vs. SLMs, RAG vs. AI agents, packaged platform capability vs. bespoke custom solution, packaged vs. open-source.

RAG has emerged as a powerful approach for building advanced AI systems that combine the strengths of large language models with external knowledge sources. However, RAG solutions struggle with reliability and require a lot of experimentation. This session will address key questions to help determine the best design pattern and optimization for RAG implementations.

In this talk, the speaker presents NuExtract, the first LLM specialized in extracting structured information (JSON output), and NuMarkdown, the first reasoning OCR LLM (RAG-ready Markdown output). The talk demonstrates low-hallucination open-source models that outclass frontier LLMs like GPT-5 and Gemini 2.5 while being orders of magnitude smaller, enabling private usage. It will demonstrate the abilities of these LLMs, show how to use them at scale, and discuss what’s coming next in information extraction.

AWS re:Invent 2025 - Optimize agentic AI apps with semantic caching in Amazon ElastiCache (DAT451)

Multi-agent AI systems now orchestrate complex workflows requiring frequent foundation model calls. In this session, learn how you can reduce latencies to single-digit milliseconds from single-digit seconds with vector search for Amazon ElastiCache for Valkey in agentic AI applications using semantic caching, while also reducing the cost incurred from your foundation models for production workloads. By implementing semantic caching in agentic architectures like RAG-powered assistants and autonomous agents, customers can create performant and cost-effective production-scale agentic AI systems.

Learn More: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Building a Data and AI Platform with PostgreSQL

In a world where data sovereignty, scalability, and AI innovation are at the forefront of enterprise strategy, PostgreSQL is emerging as the key to unlocking transformative business value. This new guide serves as your beacon for navigating the convergence of AI, open source technologies, and intelligent data platforms. Authors Tom Taulli, Benjamin Anderson, and Jozef de Vries offer a strategic and practical approach to building AI and data platforms that balance innovation with governance, empowering organizations to take control of their data future. Whether you're designing frameworks for advanced AI applications, modernizing legacy infrastructures, or solving data challenges at scale, you can use this guide to bridge the gap between technical complexity and actionable strategy. Written for IT executives, data leaders, and practitioners alike, it will equip you with the tools and insights to harness Postgre's unique capabilities—extensibility, unstructured data management, and hybrid workloads—for long-term success in an AI-driven world. Learn how to build an AI and data platform using PostgreSQL Overcome data challenges like modernization, integration, and governance Optimize AI performance with model fine-tuning and retrieval-augmented generation (RAG) best practices Discover use cases that align data strategy with business goals Take charge of your data and AI future with this comprehensive and accessible roadmap

No Cloud? No Problem. Local RAG with Embedding Gemma

Running Retrieval-Augmented Generation (RAG) pipelines often feels tied to expensive cloud APIs or large GPU clusters—but it doesn’t have to be. This session explores how Embedding Gemma, Google’s lightweight open embedding model, enables powerful RAG and text classification workflows entirely on a local machine. Using the Sentence Transformers framework with Hugging Face, high-quality embeddings can be generated efficiently for retrieval and classification tasks. Real-world examples involving call transcripts and agent remark classification illustrate how robust results can be achieved without the cloud—or the budget.

Building Production RAG Systems for Health Care Domains : Clinical Decision

Building on but moving far beyond the single-specialty focus of HandRAG, this session examines how Retrieval-Augmented Generation can be engineered to support clinical reasoning across multiple high stakes surgical areas, including orthopedic, cardiovascular, neurosurgical, and plastic surgery domains. Using a corpus of more than 7,800 clinical publications and cross specialty validation studies, the talk highlights practical methods for structuring heterogeneous medical data, optimizing vector retrieval with up to 35% latency gains, and designing prompts that preserve terminology accuracy across diverse subspecialties. Attendees will also learn a three-tier evaluation framework that improved critical-error detection by 2.4×, as well as deployment strategy such as automated literature refresh pipelines and cost-efficient architectures that reduced inference spending by 60% that enable RAG systems to operate reliably in real production healthcare settings.

The Boringly Simple Loop Powering GenAI Apps

Do you feel lost in the jungle of GenAI frameworks and buzzwords? Here's a way out. Take any GenAI app, peel away the fluff, and look at its core. You'll find the same pattern: a boringly simple nested while loop. I will show you how this loop produces chat assistants, AI agents, and multi-agent systems. Then we'll cover how RAG, tool-calling, and memory are like lego bricks we add as needed. This gives you a first-principles based map. Use it to build GenAI apps from scratch; no frameworks needed.

Keynote by Lisa Amini- What’s Next in AI for Data and Data Management?

Advances in large language models (LLMs) have propelled a recent flurry of AI tools for data management and operations. For example, AI-powered code assistants leverage LLMs to generate code for dataflow pipelines. RAG pipelines enable LLMs to ground responses with relevant information from external data sources. Data agents leverage LLMs to turn natural language questions into data-driven answers and actions. While challenges remain, these advances are opening exciting new opportunities for data scientists and engineers. In this talk, we will examine recent advances, along with some still incubating in research labs, with the goal of understanding where this is all heading, and present our perspective on what’s next for AI in data management and data operations.

Where Have All the Metrics Gone?

How exactly does one validate the factuality of answers from a Retrieval-Augmented Generation (RAG) system? Or measure the impact of the new system prompt for your customer service agent? What do you do when stakeholders keep asking for "accuracy" metrics that you simply don't have? In this talk, we’ll learn how to define (and measure) what “good” looks like when traditional model metrics don’t apply.

Scaling Python to thousands of nodes with Ray

Python is the language of choice for anything to do with AI and ML. While that has made it easy to write code for one machine, it's much more difficult to run workloads across clusters of thousands of nodes. Ray allows you to do just that. I'll demonstrate how to implement this open source tool with a few lines of code. As a demo project, I'll show how I built a RAG for the Wheel of Time series.

Building Agentic AI: Workflows, Fine-Tuning, Optimization, and Deployment

Transform Your Business with Intelligent AI to Drive Outcomes Building reactive AI applications and chatbots is no longer enough. The competitive advantage belongs to those who can build AI that can respond, reason, plan, and execute. Building Agentic AI: Workflows, Fine-Tuning, Optimization, and Deployment takes you beyond basic chatbots to create fully functional, autonomous agents that automate real workflows, enhance human decision-making, and drive measurable business outcomes across high-impact domains like customer support, finance, and research. Whether you're a developer deploying your first model, a data scientist exploring multi-agent systems and distilled LLMs, or a product manager integrating AI workflows and embedding models, this practical handbook provides tried and tested blueprints for building production-ready systems. Harness the power of reasoning models for applications like computer use, multimodal systems to work with all kinds of data, and fine-tuning techniques to get the most out of AI. Learn to test, monitor, and optimize agentic systems to keep them reliable and cost-effective at enterprise scale. Master the complete agentic AI pipeline Design adaptive AI agents with memory, tool use, and collaborative reasoning capabilities Build robust RAG workflows using embeddings, vector databases, and LangGraph state management Implement comprehensive evaluation frameworks beyond accuracy, including precision, recall, and latency metrics Deploy multimodal AI systems that seamlessly integrate text, vision, audio, and code generation Optimize models for production through fine-tuning, quantization, and speculative decoding techniques Navigate the bleeding edge of reasoning LLMs and computer-use capabilities Balance cost, speed, accuracy, and privacy in real-world deployment scenarios Create hybrid architectures that combine multiple agents for complex enterprise applications Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

AWS re:Invent 2025 - A practitioner’s guide to data for agentic AI (DAT315)

In this session, gain the skills needed to deploy end-to-end agentic AI applications using your most valuable data. This session focuses on data management using processes like Model Context Protocol (MCP) and Retrieval Augmented Generation (RAG), and provides concepts that apply to other methods of customizing agentic AI applications. Discover best practice architectures using AWS database services like Amazon Aurora and OpenSearch Service, along with analytical, data processing and streaming experiences found in SageMaker Unified Studio. Learn data lake, governance, and data quality concepts and how Amazon Bedrock AgentCore and Bedrock Knowledge Bases, and other features tie solution components together.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS