As generative and agentic AI systems move from prototypes to production, builders must balance innovation with trust, safety, and compliance. This talk covers evaluation gaps (multistep reasoning, tool use, domain-specific workflows; contamination and fragile metrics), bias and safety (demographic bias, hallucinations, unsafe autonomy with regulatory and legal obligations), continuous monitoring (MLOps strategies for drift detection, risk scoring, and compliance auditing in deployed systems), and tools and standards (open-source libraries like LangTest and HELM, stress-test and red-teaming datasets, and guidance from NIST, CHAI, and ISO).
talk-data.com
Topic
stress-test datasets
1
tagged
Activity Trend
1
peak/qtr
2020-Q1
2026-Q1
Top Speakers
Filtering by:
Governing and Evaluating Generative & Agentic AI in Regulated Industries
×