LLM

AI Evaluation from First Principles: You Can't Manage What You Can't Measure

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Pallavi Koppol (Databricks) , Jonathan Frankle (Databricks)

AI/ML Databricks GenAI

Is your AI evaluation process holding back your system's true potential? Many organizations struggle with improving GenAI quality because they don't know how to measure it effectively. This research session covers the principles of GenAI evaluation, offers a framework for measuring what truly matters, and demonstrates implementation using Databricks.Key Takeaways:-Practical approaches for establishing reliable metrics for subjective evaluations-Techniques for calibrating LLM judges to enable cost-effective, scalable assessment-Actionable frameworks for evaluation systems that evolve with your AI capabilitiesWhether you're developing models, implementing AI solutions, or leading technical teams, this session will equip you to define meaningful quality metrics for your specific use cases and build evaluation systems that expose what's working and what isn't, transforming AI guesswork into measurable success.

Automating Taxonomy Generation With Compound AI on Databricks

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Allistair Cota (Lovelytics) , Sudhir Gajre (Lovelytics)

AI/ML API Databricks

Taxonomy generation is a challenge across industries such as retail, manufacturing and e-commerce. Incomplete or inconsistent taxonomies can lead to fragmented data insights, missed monetization opportunities and stalled revenue growth. In this session, we will explore a modern approach to solving this problem by leveraging Databricks platform to build a scalable compound AI architecture for automated taxonomy generation. The first half of the session will walk you through the business significance and implications of taxonomy, followed by a technical deep dive in building an architecture for taxonomy implementation on the Databricks platform using a compound AI architecture. We will walk attendees through the anatomy of taxonomy generation, showcasing an innovative solution that combines multimodal and text-based LLMs, internal data sources and external API calls. This ensemble approach ensures more accurate, comprehensive and adaptable taxonomies that align with business needs.

Evaluation-Driven Development Workflows: Best Practices and Real-World Scenarios

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Wenwen Xie (Databricks) , Arthur Dooner (Databricks)

AI/ML API

In enterprise AI, Evaluation-Driven Development (EDD) ensures reliable, efficient systems by embedding continuous assessment and improvement into the AI development lifecycle. High-quality evaluation datasets are created using techniques like document analysis, synthetic data generation via Mosaic AI’s synthetic data generation API, SME validation, and relevance filtering, reducing manual effort and accelerating workflows. EDD focuses on metrics such as context relevance, groundedness, and response accuracy to identify and address issues like retrieval errors or model limitations. Custom LLM judges, tailored to domain-specific needs like PII detection or tone assessment, enhance evaluations. By leveraging tools like Mosaic AI Agent Framework and Agent Evaluation, MLflow, EDD automates data tracking, streamlines workflows, and quantifies improvements, transforming AI development for delivering scalable, high-performing systems that drive measurable organizational value.

From Days to Minutes - AI Transforms Audit at KPMG

2025-06-12 · Data + AI Summit 2025 Watch

talk

by David Tempelmann (Databricks) , Mark Wallington (KPMG UK)

AI/ML Databricks GenAI MLOps RAG

Imagine performing complex regulatory checks in minutes instead of days. We made this a reality using GenAI on the Databricks Data Intelligence Platform. Join us for a deep dive into our journey from POC to a production-ready AI audit tool. Discover how we automated thousands of legal requirement checks in annual reports with remarkable speed and accuracy. Learn our blueprint for: High-Performance AI: Building a scalable, >90% accurate AI system with an optimized RAG pipeline that auditors praise. Robust Productionization: Achieving secure, governed deployment using Unity Catalog, MLflow, LLM-based evaluation, and MLOps best practices. This session provides actionable insights for deploying impactful, compliant GenAI in the enterprise.

Sponsored by: Meta | Supercharge Your Apps with Llama 4: Essential Tools and Techniques for Developers

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Simon Whiteley (Advancing Analytics)

AI/ML Data Engineering Data Quality

The demand for data engineering keeps growing, but data teams are bored by repetitive tasks, stumped by growing complexity and endlessly harassed by an unrelenting need for speed. What if AI could take the heavy lifting off your hands? What if we make the move away from code-generation and into config-generation — how much more could we achieve? In this session, we’ll explore how AI is revolutionizing data engineering, turning pain points into innovation. Whether you’re grappling with manual schema generation or struggling to ensure data quality, this session offers practical solutions to help you work smarter, not harder. You’ll walk away with a good idea of where AI is going to disrupt the data engineering workload, some good tips around how to accelerate your own workflows and an impending sense of doom around the future of the industry!

Founder discussion: Matei on UC, Data Intelligence and AI Governance

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Matei Zaharia (Databricks)

AI/ML Databricks Delta Spark

Matei is a legend of open source: he started the Apache Spark project in 2009, co-founded Databricks, and worked on other widely used data and AI software, including MLflow, Delta Lake, and Dolly. His most recent research is about combining large language models (LLMs) with external data sources, such as search systems, and improving their efficiency and result quality. This will be a conversation coverering the latest and greatest of UC, Data Intelligence, AI Governance, and more.

AI/BI Genie: A Look Under the Hood of Everyone's Friendly, Neighborhood GenAI Product

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Amir Hormati (Databricks) , Alnur Ali (Databricks)

AI/ML BI GenAI RAG

Go beyond the user interface and explore the cutting-edge technology driving AI/BI Genie. This session breaks down the AI/BI Genie architecture, showcasing how LLMs, retrieval-augmented generation (RAG) and finely tuned knowledge bases work together to deliver fast, accurate responses. We’ll also explore how AI agents orchestrate workflows, optimize query performance and continuously refine their understanding. Ideal for those who want to geek out about the tech stack behind Genie, this session offers a rare look at the magic under the hood.

Beyond AI Accuracy: Building Trustworthy and Responsible AI Application Through Mosaic AI Framework

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Ananya Roy (Databricks)

AI/ML GenAI

Generic LLM metrics are useless until it meets your business needs.In this session we will dive deep into creating bespoke custom state-of-the-art AI metrics that matters to you. Discuss best practices on LLM evaluation strategies, when to use LLM judge vs. statistical metrics and many more. Through a live demo using Mosaic AI Framework, we will showcase: How you can build your own custom AI metric tailored to your needs for your GenAI application Implement autonomous AI evaluation suite for complex, multi-agent systems Generate ground truth data at scale and production monitoring strategies Drawing from extensive experience on working with customers on real-world use cases, we will share actionable insights on building a robust AI evaluation framework By the end of this session, you'll be equipped to create AI solutions that are not only powerful but also relevant to your organizations needs. Join us to transform your AI strategy and make a tangible impact on your business!

Building Responsible AI Agents on Databricks

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Pavithra Rao (Databricks) , Yassine Essawabi (Databricks)

AI/ML BI Data Lakehouse Databricks Cyber Security

This presentation explores how Databricks' Data Intelligence Platform supports the development and deployment of responsible AI in credit decisioning, ensuring fairness, transparency and regulatory compliance. Key areas include bias and fairness monitoring using Lakehouse Monitoring to track demographic metrics and automated alerts for fairness thresholds. Transparency and explainability are enhanced through the Mosaic AI Agent Framework, SHAP values and LIME for feature importance auditing. Regulatory alignment is achieved via Unity Catalog for data lineage and AIBI dashboards for compliance monitoring. Additionally, LLM reliability and security are ensured through AI guardrails and synthetic datasets to validate model outputs and prevent discriminatory patterns. The platform integrates real-time SME and user feedback via Databricks Apps and AI/BI Genie Space.

Sponsored by: Securiti | Safely Curating Data to Enable Enterprise AI with Databricks

Driving Secure AI Innovation with Obsidian Security, Databricks, and PointGuard AI

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Alfredo Hickman (Obsidian Security) , JD Braun (Databricks) , Mali Gorantla (PointGuard AI)

AI/ML Databricks Cyber Security

As enterprises adopt AI and Large Language Models (LLMs), securing and governing these models - and the data used to train them - is essential. In this session, learn how Databricks Partner PointGuard AI helps organizations implement the Databricks AI Security Framework to manage AI-specific risks, ensuring security, compliance, and governance across the entire AI lifecycle. Then, discover how Obsidian Security provides a robust approach to AI security, enabling organizations to confidently scale AI applications.

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Satish Karunakaran (Robert Bosch GmbH) , Marc-Alexander Frey (Robert Bosch GmbH)

Data Lakehouse Databricks dbt

This session will showcase Bosch’s journey in consolidating supply chain information using the Databricks platform. It will dive into how Databricks not only acts as the central data lakehouse but also integrates seamlessly with transformative components such as dbt and Large Language Models (LLMs). The talk will highlight best practices, architectural considerations, and the value of an interoperable platform in driving actionable insights and operational excellence across complex supply chain processes. Key Topics and Sections Introduction & Business Context Brief Overview of Bosch’s Supply Chain Challenges and the Need for a Consolidated Data Platform. Strategic Importance of Data-Driven Decision-Making in a Global Supply Chain Environment. Databricks as the Core Data Platform Integrating dbt for Transformation Leveraging LLM Models for Enhanced Insights

Generative AI Merchant Matching

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Tomáš Drietomský (Mastercard)

AI/ML GenAI

Our project demonstrates building enterprise AI systems cost-effectively, focusing on matching merchant descriptors to known businesses. Using fine-tuned LLMs and advanced search, we created a solution rivaling alternatives at minimal cost. The system works in three steps: A fine-tuned Llama 3 8B model parses merchant descriptors into standardized components. A hybrid search system uses these components to find candidate matches in our database. A Llama 3 70B model then evaluates top candidates, with an AI judge reviewing results for hallucination. We achieved a 400% latency improvement while maintaining accuracy and keeping costs low and each fine-tuning round cost hundreds of dollars. Through careful optimization and simple architecture for a balance between cost, speed and accuracy, we show that small teams with modest budgets can tackle complex problems effectively using this technology. We share key insights on prompt engineering, fine-tuning and cost and latency management.

Sponsored by: Cognizant | How Cognizant Helped RJR Transform Market Intelligence with GenAI

LLMOps at Intermountain Health: A Case Study on AI Inventory Agents

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Mark Nielsen (Intermountain Healthcare)

AI/ML CI/CD

In this session, we will delve into the creation of an infrastructure, CI/CD processes and monitoring systems that facilitate the responsible and efficient deployment of Large Language Models (LLMs) at Intermountain Healthcare. Using the "AI Inventory Agents" project as a case study, we will showcase how an LLM Agent can assist in effort and impact estimates, as well as provide insights into various AI products, both custom-built and third-party hosted. This includes their responsible AI certification status, development status and monitoring status (lights on, performance, drift, etc.). Attendees will learn how to build and customize their own LLMOps infrastructure to ensure seamless deployment and monitoring of LLMs, adhering to responsible AI practices.

talk-data.com

Activity Trend

Top Events

Top Speakers

AI Evaluation from First Principles: You Can't Manage What You Can't Measure

Automating Taxonomy Generation With Compound AI on Databricks

Evaluation-Driven Development Workflows: Best Practices and Real-World Scenarios

Sponsored by: Galileo Technologies Inc. | Taming Rogue AI Agents with Observability-Driven Evaluation

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

From Days to Minutes - AI Transforms Audit at KPMG

Sponsored by: Meta | Supercharge Your Apps with Llama 4: Essential Tools and Techniques for Developers

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

Founder discussion: Matei on UC, Data Intelligence and AI Governance

AI/BI Genie: A Look Under the Hood of Everyone's Friendly, Neighborhood GenAI Product

Beyond AI Accuracy: Building Trustworthy and Responsible AI Application Through Mosaic AI Framework

Building Responsible AI Agents on Databricks

Sponsored by: Securiti | Safely Curating Data to Enable Enterprise AI with Databricks

Sponsored by: Qubika | Agentic AI In Finance: How To Build Agents Using Databricks And LangGraph

Sponsored by: West Monroe | Disruptive Forces: LLMs and the New Age of Data Engineering

Driving Secure AI Innovation with Obsidian Security, Databricks, and PointGuard AI

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

Generative AI Merchant Matching

Sponsored by: Cognizant | How Cognizant Helped RJR Transform Market Intelligence with GenAI

LLMOps at Intermountain Health: A Case Study on AI Inventory Agents