talk-data.com talk-data.com

Event

PyData Boston 2025

2025-12-08 โ€“ 2025-12-10 PyData

Activities tracked

1

Filtering by: Susan Shu Chang ×

Sessions & talks

Showing 1โ€“1 of 1 ยท Newest first

Search within this event →
Evaluating AI Agents in production with Python

Evaluating AI Agents in production with Python

2025-12-10 Watch
talk

This talk covers methods of evaluating AI Agents, with an example of how the speakers built a Python-based evaluation framework for a user-facing AI Agent system which has been in production for over a year. We share tools and Python frameworks used (as well as tradeoffs and alternatives), and discuss methods such as LLM-as-Judge, rules-based evaluations, ML metrics used, as well as selection tradeoffs.