talk-data.com talk-data.com

PyData talk 2025-11-08 at 22:35

Evaluation is all you need

Description

LLM apps fail without reliable, reproducible evaluation. This talk maps the open‑source evaluation landscape, compares leading techniques (RAGAS, Evaluation Driven Development) and frameworks (DeepEval, Phoenix, LangFuse, and braintrust), and shows how to combine tests, RAG‑specific evals, and observability to ship higher‑quality systems. Attendees leave with a decision checklist, code patterns, and a production‑ready playbook.