Daina Bouquin

Activities

1

talks

Filtering by: PyData Boston 2025 ×

Filter by Event / Source

PyData Boston 2025 1

Talks & appearances

Showing 1 of 1 activities

Search activities →

Is Your LLM Evaluation Missing the Point?

2025-12-10 · PyData Boston 2025 Watch

talk

AI/ML LLM

Your LLM evaluation suite shows 93% accuracy. Then domain experts point out it's producing catastrophically wrong answers for real-world use cases. This talk explores the collaboration gap between AI engineers and domain experts that technical evaluation alone cannot bridge. Drawing from government, healthcare, and civic tech case studies, we'll examine why tools like PromptFoo, DeepEval, and RAGAS are necessary but insufficient and how structured collaboration with domain stakeholders reveals critical failures invisible to standard metrics. You'll leave with practical starting points for building cross-functional evaluation that catches problems before deployment.