talk-data.com talk-data.com

Susan Shu Chang

Speaker

Susan Shu Chang

1

talks

author
Filtering by: PyData Boston 2025 ×

Filter by Event / Source

Talks & appearances

Showing 1 of 3 activities

Search activities →
Evaluating AI Agents in production with Python

This talk covers methods of evaluating AI Agents, with an example of how the speakers built a Python-based evaluation framework for a user-facing AI Agent system which has been in production for over a year. We share tools and Python frameworks used (as well as tradeoffs and alternatives), and discuss methods such as LLM-as-Judge, rules-based evaluations, ML metrics used, as well as selection tradeoffs.