talk-data.com talk-data.com

E

Speaker

Emeli Dral

2

talks

Filter by Event / Source

Talks & appearances

2 activities · Newest first

Search activities →
AI agents testing: How to evaluate the unpredictable

AI agents and multi-step workflows are powerful, but testing them can be tricky. This talk explores practical ways to test these complex systems — like running multi-step simulations, checking tool calls, and using LLMs for evaluation. You'll also learn how to prioritize what to test and set up session-level evaluations with open-source tools.

Proper monitoring of machine learning models in production is essential to avoid performance issues. Setting up monitoring can be easy for a single model, but it often becomes challenging at scale or when you face alert fatigue based on many metrics and dashboards.

In this talk, I will introduce the concept of test-based ML monitoring. I will explore how to prioritize metrics based on risks and model use cases, integrate checks in the prediction pipeline and standardize them across similar models and model lifecycle. I will also take an in-depth look at batch model monitoring architecture and the use of open-source tools for setup and analysis.