talk-data.com
PyData
talk
2025-12-10 at 19:15
Evaluating AI Agents in production with Python
Event:
PyData Boston 2025
Speakers
Description
This talk covers methods of evaluating AI Agents, with an example of how the speakers built a Python-based evaluation framework for a user-facing AI Agent system which has been in production for over a year. We share tools and Python frameworks used (as well as tradeoffs and alternatives), and discuss methods such as LLM-as-Judge, rules-based evaluations, ML metrics used, as well as selection tradeoffs.