In this talk, Hugo Bowne-Anderson, an independent data and AI consultant, educator, and host of the podcasts Vanishing Gradients and High Signal, shares his journey from academic research and curriculum design at DataCamp to advising teams at Netflix, Meta, and the US Air Force. Together, we explore how to build reliable, production-ready AI systems—from prompt evaluation and dataset design to embedding agents into everyday workflows.
You’ll learn about: How to structure teams and incentives for successful AI adoptionPractical prompting techniques for accurate timestamp and data generationBuilding and maintaining evaluation sets to avoid “prompt overfitting”- Cost-effective methods for LLM evaluation and monitoringTools and frameworks for debugging and observing AI behavior (Logfire, Braintrust, Phoenix Arise)The evolution of AI agents—from simple RAG systems to proactive, embedded assistantsHow to escape “proof of concept purgatory” and prioritize AI projects that drive business valueStep-by-step guidance for building reliable, evaluable AI agents This session is ideal for AI engineers, data scientists, ML product managers, and startup founders looking to move beyond experimentation into robust, scalable AI systems. Whether you’re optimizing RAG pipelines, evaluating prompts, or embedding AI into products, this talk offers actionable frameworks to guide you from concept to production.
LINKS Escaping POC Purgatory: Evaluation-Driven Development for AI Systems - https://www.oreilly.com/radar/escaping-poc-purgatory-evaluation-driven-development-for-ai-systems/Stop Building AI Agents - https://www.decodingai.com/p/stop-building-ai-agentsHow to Evaluate LLM Apps Before You Launch - https://www.youtube.com/watch?si=90fXJJQThSwGCaYv&v=TTr7zPLoTJI&feature=youtu.beMy Vanishing Gradients Substack - https://hugobowne.substack.com/Building LLM Applications for Data Scientists and Software Engineers https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=datatalksclub TIMECODES: 00:00 Introduction and Expertise 04:04 Transition to Freelance Consulting and Advising 08:49 Restructuring Teams and Incentivizing AI Adoption 12:22 Improving Prompting for Timestamp Generation 17:38 Evaluation Sets and Failure Analysis for Reliable Software 23:00 Evaluating Prompts: The Cost and Size of Gold Test Sets 27:38 Software Tools for Evaluation and Monitoring 33:14 Evolution of AI Tools: Proactivity and Embedded Agents 40:12 The Future of AI is Not Just Chat 44:38 Avoiding Proof of Concept Purgatory: Prioritizing RAG for Business Value 50:19 RAG vs. Agents: Complexity and Power Trade-Offs 56:21 Recommended Steps for Building Agents 59:57 Defining Memory in Multi-Turn Conversations
Connect with Hugo Twitter - https://x.com/hugobowneLinkedin - https://www.linkedin.com/in/hugo-bowne-anderson-045939a5/Github - https://github.com/hugobowneWebsite - https://hugobowne.github.io/ Connect with DataTalks.Club: Join the community - https://datatalks.club/slack.htmlSubscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQCheck other upcoming events - https://lu.ma/dtc-eventsGitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/