Data + AI Summit 2025

AI Evaluation from First Principles: You Can't Manage What You Can't Measure

2025-06-12 Watch

talk

Pallavi Koppol (Databricks) , Jonathan Frankle (Databricks)

AI/ML Databricks GenAI LLM

Is your AI evaluation process holding back your system's true potential? Many organizations struggle with improving GenAI quality because they don't know how to measure it effectively. This research session covers the principles of GenAI evaluation, offers a framework for measuring what truly matters, and demonstrates implementation using Databricks.Key Takeaways:-Practical approaches for establishing reliable metrics for subjective evaluations-Techniques for calibrating LLM judges to enable cost-effective, scalable assessment-Actionable frameworks for evaluation systems that evolve with your AI capabilitiesWhether you're developing models, implementing AI solutions, or leading technical teams, this session will equip you to define meaningful quality metrics for your specific use cases and build evaluation systems that expose what's working and what isn't, transforming AI guesswork into measurable success.

Automating Taxonomy Generation With Compound AI on Databricks

2025-06-12 Watch

talk

Allistair Cota (Lovelytics) , Sudhir Gajre (Lovelytics)

AI/ML API Databricks LLM

Taxonomy generation is a challenge across industries such as retail, manufacturing and e-commerce. Incomplete or inconsistent taxonomies can lead to fragmented data insights, missed monetization opportunities and stalled revenue growth. In this session, we will explore a modern approach to solving this problem by leveraging Databricks platform to build a scalable compound AI architecture for automated taxonomy generation. The first half of the session will walk you through the business significance and implications of taxonomy, followed by a technical deep dive in building an architecture for taxonomy implementation on the Databricks platform using a compound AI architecture. We will walk attendees through the anatomy of taxonomy generation, showcasing an innovative solution that combines multimodal and text-based LLMs, internal data sources and external API calls. This ensemble approach ensures more accurate, comprehensive and adaptable taxonomies that align with business needs.

Evaluation-Driven Development Workflows: Best Practices and Real-World Scenarios

2025-06-12 Watch

talk

Wenwen Xie (Databricks) , Arthur Dooner (Databricks)

AI/ML API LLM

In enterprise AI, Evaluation-Driven Development (EDD) ensures reliable, efficient systems by embedding continuous assessment and improvement into the AI development lifecycle. High-quality evaluation datasets are created using techniques like document analysis, synthetic data generation via Mosaic AI’s synthetic data generation API, SME validation, and relevance filtering, reducing manual effort and accelerating workflows. EDD focuses on metrics such as context relevance, groundedness, and response accuracy to identify and address issues like retrieval errors or model limitations. Custom LLM judges, tailored to domain-specific needs like PII detection or tone assessment, enhance evaluations. By leveraging tools like Mosaic AI Agent Framework and Agent Evaluation, MLflow, EDD automates data tracking, streamlines workflows, and quantifies improvements, transforming AI development for delivering scalable, high-performing systems that drive measurable organizational value.

From Days to Minutes - AI Transforms Audit at KPMG

2025-06-12 Watch

talk

David Tempelmann (Databricks) , Mark Wallington (KPMG UK)

AI/ML Databricks GenAI LLM MLOps RAG

Imagine performing complex regulatory checks in minutes instead of days. We made this a reality using GenAI on the Databricks Data Intelligence Platform. Join us for a deep dive into our journey from POC to a production-ready AI audit tool. Discover how we automated thousands of legal requirement checks in annual reports with remarkable speed and accuracy. Learn our blueprint for: High-Performance AI: Building a scalable, >90% accurate AI system with an optimized RAG pipeline that auditors praise. Robust Productionization: Achieving secure, governed deployment using Unity Catalog, MLflow, LLM-based evaluation, and MLOps best practices. This session provides actionable insights for deploying impactful, compliant GenAI in the enterprise.

Sponsored by: Meta | Supercharge Your Apps with Llama 4: Essential Tools and Techniques for Developers

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

2025-06-12 Watch

lightning_talk

Simon Whiteley (Advancing Analytics)

AI/ML Data Engineering Data Quality LLM

The demand for data engineering keeps growing, but data teams are bored by repetitive tasks, stumped by growing complexity and endlessly harassed by an unrelenting need for speed. What if AI could take the heavy lifting off your hands? What if we make the move away from code-generation and into config-generation — how much more could we achieve? In this session, we’ll explore how AI is revolutionizing data engineering, turning pain points into innovation. Whether you’re grappling with manual schema generation or struggling to ensure data quality, this session offers practical solutions to help you work smarter, not harder. You’ll walk away with a good idea of where AI is going to disrupt the data engineering workload, some good tips around how to accelerate your own workflows and an impending sense of doom around the future of the industry!

Founder discussion: Matei on UC, Data Intelligence and AI Governance

2025-06-12 Watch

talk

Matei Zaharia (Databricks)

AI/ML Databricks Delta LLM Spark

Matei is a legend of open source: he started the Apache Spark project in 2009, co-founded Databricks, and worked on other widely used data and AI software, including MLflow, Delta Lake, and Dolly. His most recent research is about combining large language models (LLMs) with external data sources, such as search systems, and improving their efficiency and result quality. This will be a conversation coverering the latest and greatest of UC, Data Intelligence, AI Governance, and more.

AI/BI Genie: A Look Under the Hood of Everyone's Friendly, Neighborhood GenAI Product

2025-06-12 Watch

talk

Amir Hormati (Databricks) , Alnur Ali (Databricks)

AI/ML BI GenAI LLM RAG

Go beyond the user interface and explore the cutting-edge technology driving AI/BI Genie. This session breaks down the AI/BI Genie architecture, showcasing how LLMs, retrieval-augmented generation (RAG) and finely tuned knowledge bases work together to deliver fast, accurate responses. We’ll also explore how AI agents orchestrate workflows, optimize query performance and continuously refine their understanding. Ideal for those who want to geek out about the tech stack behind Genie, this session offers a rare look at the magic under the hood.

Beyond AI Accuracy: Building Trustworthy and Responsible AI Application Through Mosaic AI Framework

2025-06-12 Watch

talk

Ananya Roy (Databricks)

AI/ML GenAI LLM

Generic LLM metrics are useless until it meets your business needs.In this session we will dive deep into creating bespoke custom state-of-the-art AI metrics that matters to you. Discuss best practices on LLM evaluation strategies, when to use LLM judge vs. statistical metrics and many more. Through a live demo using Mosaic AI Framework, we will showcase: How you can build your own custom AI metric tailored to your needs for your GenAI application Implement autonomous AI evaluation suite for complex, multi-agent systems Generate ground truth data at scale and production monitoring strategies Drawing from extensive experience on working with customers on real-world use cases, we will share actionable insights on building a robust AI evaluation framework By the end of this session, you'll be equipped to create AI solutions that are not only powerful but also relevant to your organizations needs. Join us to transform your AI strategy and make a tangible impact on your business!

Building Responsible AI Agents on Databricks

2025-06-12 Watch

talk

Pavithra Rao (Databricks) , Yassine Essawabi (Databricks)

AI/ML BI Data Lakehouse Databricks LLM Cyber Security

This presentation explores how Databricks' Data Intelligence Platform supports the development and deployment of responsible AI in credit decisioning, ensuring fairness, transparency and regulatory compliance. Key areas include bias and fairness monitoring using Lakehouse Monitoring to track demographic metrics and automated alerts for fairness thresholds. Transparency and explainability are enhanced through the Mosaic AI Agent Framework, SHAP values and LIME for feature importance auditing. Regulatory alignment is achieved via Unity Catalog for data lineage and AIBI dashboards for compliance monitoring. Additionally, LLM reliability and security are ensured through AI guardrails and synthetic datasets to validate model outputs and prevent discriminatory patterns. The platform integrates real-time SME and user feedback via Databricks Apps and AI/BI Genie Space.

Sponsored by: Securiti | Safely Curating Data to Enable Enterprise AI with Databricks

Driving Secure AI Innovation with Obsidian Security, Databricks, and PointGuard AI

2025-06-11 Watch

talk

Alfredo Hickman (Obsidian Security) , JD Braun (Databricks) , Mali Gorantla (PointGuard AI)

AI/ML Databricks LLM Cyber Security

As enterprises adopt AI and Large Language Models (LLMs), securing and governing these models - and the data used to train them - is essential. In this session, learn how Databricks Partner PointGuard AI helps organizations implement the Databricks AI Security Framework to manage AI-specific risks, ensuring security, compliance, and governance across the entire AI lifecycle. Then, discover how Obsidian Security provides a robust approach to AI security, enabling organizations to confidently scale AI applications.

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

2025-06-11 Watch

talk

Satish Karunakaran (Robert Bosch GmbH) , Marc-Alexander Frey (Robert Bosch GmbH)

Data Lakehouse Databricks dbt LLM

This session will showcase Bosch’s journey in consolidating supply chain information using the Databricks platform. It will dive into how Databricks not only acts as the central data lakehouse but also integrates seamlessly with transformative components such as dbt and Large Language Models (LLMs). The talk will highlight best practices, architectural considerations, and the value of an interoperable platform in driving actionable insights and operational excellence across complex supply chain processes. Key Topics and Sections Introduction & Business Context Brief Overview of Bosch’s Supply Chain Challenges and the Need for a Consolidated Data Platform. Strategic Importance of Data-Driven Decision-Making in a Global Supply Chain Environment. Databricks as the Core Data Platform Integrating dbt for Transformation Leveraging LLM Models for Enhanced Insights

Generative AI Merchant Matching

2025-06-11 Watch

lightning_talk

Tomáš Drietomský (Mastercard)

AI/ML GenAI LLM

Our project demonstrates building enterprise AI systems cost-effectively, focusing on matching merchant descriptors to known businesses. Using fine-tuned LLMs and advanced search, we created a solution rivaling alternatives at minimal cost. The system works in three steps: A fine-tuned Llama 3 8B model parses merchant descriptors into standardized components. A hybrid search system uses these components to find candidate matches in our database. A Llama 3 70B model then evaluates top candidates, with an AI judge reviewing results for hallucination. We achieved a 400% latency improvement while maintaining accuracy and keeping costs low and each fine-tuning round cost hundreds of dollars. Through careful optimization and simple architecture for a balance between cost, speed and accuracy, we show that small teams with modest budgets can tackle complex problems effectively using this technology. We share key insights on prompt engineering, fine-tuning and cost and latency management.

Sponsored by: Cognizant | How Cognizant Helped RJR Transform Market Intelligence with GenAI

LLMOps at Intermountain Health: A Case Study on AI Inventory Agents

2025-06-11 Watch

talk

Mark Nielsen (Intermountain Healthcare)

AI/ML CI/CD LLM

In this session, we will delve into the creation of an infrastructure, CI/CD processes and monitoring systems that facilitate the responsible and efficient deployment of Large Language Models (LLMs) at Intermountain Healthcare. Using the "AI Inventory Agents" project as a case study, we will showcase how an LLM Agent can assist in effort and impact estimates, as well as provide insights into various AI products, both custom-built and third-party hosted. This includes their responsible AI certification status, development status and monitoring status (lights on, performance, drift, etc.). Attendees will learn how to build and customize their own LLMOps infrastructure to ensure seamless deployment and monitoring of LLMs, adhering to responsible AI practices.

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI

Streamlining AI Application Development With Databricks Apps

2025-06-11 Watch

lightning_talk

Domonkos Pal (Hiflylabs Zrt.)

AI/ML CI/CD Cloud Computing Databricks LLM

Think Databricks is just for data and models? Think again. In this session, you’ll see how to build and scale a full-stack AI app capable of handling thousands of queries per second entirely on Databricks. No extra cloud platforms, no patchwork infrastructure. Just one unified platform with native hosting, LLM integration, secure access, and built-in CI/CD. Learn how Databricks Apps, along with services like Model Serving, Jobs, and Gateways, streamline your architecture, eliminate boilerplate, and accelerate development, from prototype to production.

Sponsored by: Snorkel AI | Evaluating and Improving Performance of Agentic Systems

Adobe’s Security Lakehouse: OCSF, Data Efficiency and Threat Detection at Scale

2025-06-11 Watch

talk

Bharat Gamini (Adobe) , Andrew Krioukov (Antimatter)

Analytics Data Lakehouse Databricks LLM Cyber Security

This session will explore how Adobe uses a sophisticated data security architecture built on the Databricks Data Intelligence Platform, along with the Open Cybersecurity Schema Framework (OCSF), to enable scalable, real-time threat detection across more than 10 PB of security data. We’ll compare different approaches to OCSF implementation and demonstrate how Adobe processes massive security datasets efficiently — reducing query times by 18%, maintaining 99.4% SLA compliance, and supporting 286 security users across 17 teams with over 4,500 daily queries. By using Databricks' Platform for serverless compute, scalable architecture, and LLM-powered recommendations, Adobe has significantly improved processing speed and efficiency, resulting in substantial cost savings. We’ll also highlight how OCSF enables advanced cross-tool analytics and automation, streamlining investigations. Finally, we’ll introduce Databricks’ new open-source OCSF toolkit for scalable security data normalization and invite the community to contribute.

Generating Laughter: Testing and Evaluating the Success of LLMs for Comedy

2025-06-11 Watch

lightning_talk

Erin Staples (Galileo)

AI/ML GenAI LLM

Nondeterministic AI models, like large language models (LLMs), offer immense creative potential but require new approaches to testing and scalability. Drawing from her experience running New York Times-featured Generative AI comedy shows, Erin uncovers how traditional benchmarks may fall short and how embracing unpredictability can lead to innovative, laugh-inducing results. This talk will explore methods like multi-tiered feedback loops, chaos testing and exploratory user testing, where AI outputs are evaluated not by rigid accuracy standards but by their adaptability and resonance across different contexts — from comedy generation to functional applications. Erin will emphasize the importance of establishing a root source of truth — a reliable dataset or core principle — to manage consistency while embracing creativity. Whether you’re looking to generate a few laughs of your own or explore creative uses of Generative AI, this talk will inspire and delight enthusiasts of all levels.

talk-data.com

Top Topics

Top Speakers

AI Evaluation from First Principles: You Can't Manage What You Can't Measure

Automating Taxonomy Generation With Compound AI on Databricks

Evaluation-Driven Development Workflows: Best Practices and Real-World Scenarios

Sponsored by: Galileo Technologies Inc. | Taming Rogue AI Agents with Observability-Driven Evaluation

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

From Days to Minutes - AI Transforms Audit at KPMG

Sponsored by: Meta | Supercharge Your Apps with Llama 4: Essential Tools and Techniques for Developers

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

Founder discussion: Matei on UC, Data Intelligence and AI Governance

AI/BI Genie: A Look Under the Hood of Everyone's Friendly, Neighborhood GenAI Product

Beyond AI Accuracy: Building Trustworthy and Responsible AI Application Through Mosaic AI Framework

Building Responsible AI Agents on Databricks

Sponsored by: Securiti | Safely Curating Data to Enable Enterprise AI with Databricks

Sponsored by: Qubika | Agentic AI In Finance: How To Build Agents Using Databricks And LangGraph

Sponsored by: West Monroe | Disruptive Forces: LLMs and the New Age of Data Engineering

Driving Secure AI Innovation with Obsidian Security, Databricks, and PointGuard AI

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

Generative AI Merchant Matching

Sponsored by: Cognizant | How Cognizant Helped RJR Transform Market Intelligence with GenAI

LLMOps at Intermountain Health: A Case Study on AI Inventory Agents

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI

Streamlining AI Application Development With Databricks Apps

Sponsored by: Snorkel AI | Evaluating and Improving Performance of Agentic Systems

Adobe’s Security Lakehouse: OCSF, Data Efficiency and Threat Detection at Scale

Generating Laughter: Testing and Evaluating the Success of LLMs for Comedy