talk-data.com talk-data.com

Event

PyData Berlin 2025

2025-09-01 – 2025-09-03 PyData

Activities tracked

1

Filtering by: Iryna Kondrashchenko ×

Sessions & talks

Showing 1–1 of 1 · Newest first

Search within this event →
Beyond Benchmarks: Practical Evaluation Strategies for Compound AI Systems

Beyond Benchmarks: Practical Evaluation Strategies for Compound AI Systems

2025-09-02 Watch
talk

Evaluating large language models (LLMs) in real-world applications goes far beyond standard benchmarks. When LLMs are embedded in complex pipelines, choosing the right models, prompts, and parameters becomes an ongoing challenge.

In this talk, we will present a practical, human-in-the-loop evaluation framework that enables systematic improvement of LLM-powered systems based on expert feedback. By combining domain expert insights and automated evaluation methods, it is possible to iteratively refine these systems while building transparency and trust.

This talk will be valuable for anyone who wants to ensure their LLM applications can handle real-world complexity - not just perform well on generic benchmarks.