Topic

artificial intelligence (ai)

Activities

1

tagged

Activity Trend

1 peak/qtr

2020-Q1 2026-Q1

Top Events

O'Reilly AI & ML Books 21 O'Reilly Business Intelligence Books 5 O'Reilly Data Engineering Books 3 Behavioral Banking For Loyalty That Sticks 1 Webinar "How Programmatic Feature Discovery Changes the Data Science Workflow" 1

Top Speakers

Al Naqvi 2 Mahesh Subramaniam 1 Susheela Hooda 1 Shilpi Harnal 1 Gavriel Salvendy 1 Vania Vieira Estrela 1 Francisco Javier Campos Zabala 1 Hamel Husain 1 Reema Thareja 1 Sweta Dixit 1 Jim Sterne (Board Chair, Digital Analytics Association - USA) 1 Przemek Chojecki 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Hamel Husain ×

Evals for AI Engineers

2026-10-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hamel Husain , Shreya Shankar

AI/ML LLM ai-ml artificial-intelligence-ai data

Stop using guesswork to find out how your AI applications are performing. Evals for AI Engineers equips you with the proven tools and processes required to systematically test, measure, and enhance the reliability of AI applications, especially those using LLMs. Written by AI engineers with extensive experience in real-world consulting (across 35+ AI products) and cutting-edge research, this practical resource will help you move from assumptions to robust, data-driven evaluation. Ideal for software engineers, technical product managers, and technical leads, this hands-on guide dives into techniques like error analysis, synthetic data generation, automated LLM-as-a-judge systems, production monitoring, and cost optimization. You'll learn how to debug LLM behavior, design test suites based on synthetic and real data, and build data flywheels that improve over time. Whether you're starting without user data or scaling a production system, you'll gain the skills to build AI you can trust—with processes that are repeatable, measurable, and aligned with real-world outcomes. Run systematic error analyses to uncover, categorize, and prioritize failure modes Build, implement, and automate evaluation pipelines using code-based and LLM-based metrics Optimize AI performance and costs through smart evaluation and feedback loops Apply key principles and techniques for monitoring AI applications in production