talk-data.com talk-data.com

N

Speaker

Nikhil Thorat

1

talks

Software Engineer (GenAI Observability) Databricks

Filter by Event / Source

Talks & appearances

1 activities · Newest first

Search activities →
Creating LLM Judges to Measure Domain-Specific Agent Quality

This session is repeated. Measuring the effectiveness of domain-specific AI agents requires specialized evaluation frameworks that go beyond standard LLM benchmarks. This session explores methodologies for assessing agent quality across specialized knowledge domains, tailored workflows, and task-specific objectives. We'll demonstrate practical approaches to designing robust LLM judges that align with your business goals and provide meaningful insights into agent capabilities and limitations. Key session takeaways include: Tools for creating domain-relevant evaluation datasets and benchmarks that accurately reflect real-world use cases Approach for creating LLM judges to measure domain-specific metrics Strategies for interpreting those results to drive iterative improvement in agent performance Join us to learn how proper evaluation methodologies can transform your domain-specific agents from experimental tools to trusted enterprise solutions with measurable business value.