LLM agents often drift into failure when prompts, retrieval, external data, and policies interact in unpredictable ways. This technical session introduces a repeatable, metric-driven framework for detecting, diagnosing, and correcting these undesirable behaviors in agentic systems at production scale. We demonstrate how to instrument the agent loop with fine-grained signals—tool-selection quality, error rates, action progression, latency, and domain-specific metrics—and send them into an evaluation layer (e.g. Galileo). This telemetry enables a virtuous cycle of system improvement. We present a practical example of a stock-trading system and show how brittle retrieval and faulty business logic cause undesirable behavior. We refactor prompts, adjust the retrieval pipeline—verifying recovery through improved metrics. Attendees will learn how to: add observability with minimal code change, pinpoint root causes via tracing, and drive continuous, metric-validated improvement.
talk-data.com
Speaker
Atindriyo Sanyal
2
talks
Atindriyo Sanyal is the CTO and co-founder of Galileo, a Gen AI Reliability company. He brings extensive engineering leadership from roles at Apple and Uber, including work on early versions of Siri and scaling Uber’s AI initiatives with the Feature Store (Michelangelo), helping bring thousands of AI models into production for over a billion users. As Galileo’s cofounder, he leads Engineering and Research to build the world’s first AI Reliability Platform for Gen AI, focused on robust evaluations and observability to foster trust in AI applications.
Bio from: Data + AI Summit 2025
Frequent Collaborators
Filter by Event / Source
Talks & appearances
Showing 2 of 7 activities
Hear from VC leaders, startup founders and early stage customers building on Databricks around what they are seeing in the market and how they are scaling their early stage companies on Databricks. This event is a must see for VCs, founders and those interested in the early stage company ecosystem.