LLM agents often drift into failure when prompts, retrieval, external data, and policies interact in unpredictable ways. This technical session introduces a repeatable, metric-driven framework for detecting, diagnosing, and correcting these undesirable behaviors in agentic systems at production scale. We demonstrate how to instrument the agent loop with fine-grained signals—tool-selection quality, error rates, action progression, latency, and domain-specific metrics—and send them into an evaluation layer (e.g. Galileo). This telemetry enables a virtuous cycle of system improvement. We present a practical example of a stock-trading system and show how brittle retrieval and faulty business logic cause undesirable behavior. We refactor prompts, adjust the retrieval pipeline—verifying recovery through improved metrics. Attendees will learn how to: add observability with minimal code change, pinpoint root causes via tracing, and drive continuous, metric-validated improvement.
talk-data.com
Speaker
Atindriyo Sanyal
7
talks
Atindriyo Sanyal is the CTO and co-founder of Galileo, a Gen AI Reliability company. He brings extensive engineering leadership from roles at Apple and Uber, including work on early versions of Siri and scaling Uber’s AI initiatives with the Feature Store (Michelangelo), helping bring thousands of AI models into production for over a billion users. As Galileo’s cofounder, he leads Engineering and Research to build the world’s first AI Reliability Platform for Gen AI, focused on robust evaluations and observability to foster trust in AI applications.
Bio from: Data + AI Summit 2025
Frequent Collaborators
Filter by Event / Source
Talks & appearances
7 activities · Newest first
Hear from VC leaders, startup founders and early stage customers building on Databricks around what they are seeing in the market and how they are scaling their early stage companies on Databricks. This event is a must see for VCs, founders and those interested in the early stage company ecosystem.
Exploring agents with frameworks like Langchain and vector databases; learn how to build apps that can perform tasks autonomously.
Learn how to tune models on your own treasure trove of data.
Image generators like Midjourney and Stable Diffusion have gotten stronger and stronger in record time. Learn to leverage them for max effect now.
Learn about fine tuning, prompt tuning, guardrails, and middleware to make LLMs more consistent and reliable.
Find out about the latest attack vectors and how LLMs and generative AI present security challenges for teams and how you can mitigate those problems.