Search – talk-data.com

Title & Speakers	Event
Governing and Evaluating Generative & Agentic AI in Regulated Industries 2025-10-29 · 18:50 David Talby – CEO @ John Snow Labs and Pacific AI As generative and agentic AI systems move from prototypes to production, builders must balance innovation with trust, safety, and compliance. This talk covers evaluation gaps (multistep reasoning, tool use, domain-specific workflows; contamination and fragile metrics), bias and safety (demographic bias, hallucinations, unsafe autonomy with regulatory and legal obligations), continuous monitoring (MLOps strategies for drift detection, risk scoring, and compliance auditing in deployed systems), and tools and standards (open-source libraries like LangTest and HELM, stress-test and red-teaming datasets, and guidance from NIST, CHAI, and ISO). langtest helm stress-test datasets red-teaming datasets nist chai iso	Governing and Evaluating Generative & Agentic AI in Regulated Industries
Governing and Evaluating Generative & Agentic AI in Regulated Industries 2025-10-29 · 11:55 David Talby – CEO @ John Snow Labs and Pacific AI As generative and agentic AI systems move from prototypes to production, builders must balance innovation with trust, safety, and compliance. This talk explores the unique evaluation and monitoring challenges of next-generation AI, with healthcare as a case study of one of the most regulated domains: Evaluation gaps: why conventional benchmarks miss multi-step reasoning, tool use, and domain-specific workflows—and how contamination and fragile metrics distort results. Bias and safety: demographic bias, hallucinations, and unsafe autonomy that trigger regulatory, legal, and contractual obligations for fairness and safety assessments. Continuous monitoring: practical MLOps strategies for drift detection, risk scoring, and compliance auditing in deployed systems. Tools and standards: open-source libraries like LangTest and HELM, new stress-test and red teaming datasets, and emerging guidance from NIST, CHAI, and ISO. While the examples draw heavily from healthcare, the lessons are broadly applicable to anyone building and deploying generative or agentic AI systems in highly regulated industries where safety, fairness, and compliance are paramount. langtest helm nist chai iso	Governing and Evaluating Generative & Agentic AI in Regulated Industries

Title & Speakers

Event

Governing and Evaluating Generative & Agentic AI in Regulated Industries 2025-10-29 · 18:50

David Talby – CEO @ John Snow Labs and Pacific AI

As generative and agentic AI systems move from prototypes to production, builders must balance innovation with trust, safety, and compliance. This talk covers evaluation gaps (multistep reasoning, tool use, domain-specific workflows; contamination and fragile metrics), bias and safety (demographic bias, hallucinations, unsafe autonomy with regulatory and legal obligations), continuous monitoring (MLOps strategies for drift detection, risk scoring, and compliance auditing in deployed systems), and tools and standards (open-source libraries like LangTest and HELM, stress-test and red-teaming datasets, and guidance from NIST, CHAI, and ISO).

langtest helm stress-test datasets red-teaming datasets nist chai iso

Governing and Evaluating Generative & Agentic AI in Regulated Industries

Governing and Evaluating Generative & Agentic AI in Regulated Industries 2025-10-29 · 11:55

David Talby – CEO @ John Snow Labs and Pacific AI

As generative and agentic AI systems move from prototypes to production, builders must balance innovation with trust, safety, and compliance. This talk explores the unique evaluation and monitoring challenges of next-generation AI, with healthcare as a case study of one of the most regulated domains: Evaluation gaps: why conventional benchmarks miss multi-step reasoning, tool use, and domain-specific workflows—and how contamination and fragile metrics distort results. Bias and safety: demographic bias, hallucinations, and unsafe autonomy that trigger regulatory, legal, and contractual obligations for fairness and safety assessments. Continuous monitoring: practical MLOps strategies for drift detection, risk scoring, and compliance auditing in deployed systems. Tools and standards: open-source libraries like LangTest and HELM, new stress-test and red teaming datasets, and emerging guidance from NIST, CHAI, and ISO. While the examples draw heavily from healthcare, the lessons are broadly applicable to anyone building and deploying generative or agentic AI systems in highly regulated industries where safety, fairness, and compliance are paramount.

langtest helm nist chai iso

Governing and Evaluating Generative & Agentic AI in Regulated Industries

Activities & events