Data + AI Summit 2025

Beyond AI Accuracy: Building Trustworthy and Responsible AI Application Through Mosaic AI Framework

2025-06-12 Watch

talk

Ananya Roy (Databricks)

AI/ML GenAI LLM

Generic LLM metrics are useless until it meets your business needs.In this session we will dive deep into creating bespoke custom state-of-the-art AI metrics that matters to you. Discuss best practices on LLM evaluation strategies, when to use LLM judge vs. statistical metrics and many more. Through a live demo using Mosaic AI Framework, we will showcase: How you can build your own custom AI metric tailored to your needs for your GenAI application Implement autonomous AI evaluation suite for complex, multi-agent systems Generate ground truth data at scale and production monitoring strategies Drawing from extensive experience on working with customers on real-world use cases, we will share actionable insights on building a robust AI evaluation framework By the end of this session, you'll be equipped to create AI solutions that are not only powerful but also relevant to your organizations needs. Join us to transform your AI strategy and make a tangible impact on your business!

Bridging BI Tools: Deep Dive Into AI/BI Dashboards for Power BI Practitioners

2025-06-12 Watch

talk

Marius-Cristian Panga (Databricks) , Wasim Ahmad (Databricks)

AI/ML Analytics BI Data Analytics Databricks Power BI

In the rapidly-evolving field of data analytics, (AI/BI) dashboards and Power BI stand out as two formidable approaches, each offering unique strengths and catering to specific use cases. Power BI has earned its reputation for delivering user-friendly, highly customisable visualisations and reports for data analysis. On the other hand, AI/BI dashboards have gained good traction due to their seamless integration with the Databricks platform, making them an attractive option for data practitioners. This session will provide a comparison of these two tools, highlighting their respective features, strengths and potential limitations. Understanding the nuances between these tools is crucial for organizations aiming to make informed decisions about their data analytics strategy. This session will equip participants with the knowledge needed to select the most appropriate tool or combination of tools to meet their data analysis requirements and drive data-informed decision-making processes.

Building Responsible AI Agents on Databricks

2025-06-12 Watch

talk

Pavithra Rao (Databricks) , Yassine Essawabi (Databricks)

AI/ML BI Data Lakehouse Databricks LLM Cyber Security

This presentation explores how Databricks' Data Intelligence Platform supports the development and deployment of responsible AI in credit decisioning, ensuring fairness, transparency and regulatory compliance. Key areas include bias and fairness monitoring using Lakehouse Monitoring to track demographic metrics and automated alerts for fairness thresholds. Transparency and explainability are enhanced through the Mosaic AI Agent Framework, SHAP values and LIME for feature importance auditing. Regulatory alignment is achieved via Unity Catalog for data lineage and AIBI dashboards for compliance monitoring. Additionally, LLM reliability and security are ensured through AI guardrails and synthetic datasets to validate model outputs and prevent discriminatory patterns. The platform integrates real-time SME and user feedback via Databricks Apps and AI/BI Genie Space.

Collaborative Innovation: How to Spur Innovation While Driving Efficiency

2025-06-12 Watch

talk

Anthony Meyers (Aon) , Paul McComish (Allianz) , Antoine Amend (Databricks)

Databricks Delta

Collaboration is redefining efficiency in insurance. This session explores how technologies such as Databricks Delta Sharing, secure data clean rooms, and data marketplaces are empowering insurers to securely share and analyze data across organizational boundaries—without exposing sensitive information. Discover how these solutions streamline operations, enhance risk modeling with real-time data integration, and enable the creation of tailored products through multi-party collaboration. Learn how insurers are leveraging these collaborative data ecosystems to reduce costs, drive innovation, and deliver better customer outcomes, all while maintaining strong privacy and governance standards. Join us to see how embracing collaborative frameworks is helping insurers operate smarter, faster, and more efficiently.

Databricks in Action: Azure’s Blueprint for Secure and Cost-Effective Operations

2025-06-12 Watch

talk

Oliver Schluga (Erste Group) , Vukola Milenkovic (Erste Group)

AI/ML Analytics Azure Cloud Computing Databricks GenAI

Erste Group's transition to Azure Databricks marked a significant upgrade from a legacy system to a secure, scalable and cost-effective cloud platform. The initial architecture, characterized by a complex hub-spoke design and stringent compliance regulations, was replaced with a more efficient solution. The phased migration addressed high network costs and operational inefficiencies, resulting in a 60% reduction in networking costs and a 30% reduction in compute costs for the central team. This transformation, completed over a year, now supports real-time analytics, advanced machine learning and GenAI while ensuring compliance with European regulations. The new platform features a Unity Catalogue, separate data catalogs and dedicated workspaces, demonstrating a successful shift to a cloud-based machine learning environment with significant improvements in cost, performance and security.

Democratizing Data Engineering with Databricks and dbt at Ludia

2025-06-12 Watch

talk

Jean-Christophe Rodrigue (Ludia) , Huntting Buckley (Databricks)

Data Engineering Databricks dbt

Ludia, a leading mobile gaming company, is empowering its analysts and domain experts by democratizing data engineering with Databricks and dbt. This talk explores how Ludia enabled cross-functional teams to build and maintain production-grade data pipelines without relying solely on centralized data engineering resources—accelerating time to insight, improving data reliability, and fostering a culture of data ownership across the organization.

DSPy 3.0 — and DSPy at Databricks

2025-06-12 Watch

talk

Krista Opsahl-Ong (Databricks) , Omar Khattab (Databricks)

Databricks

The DSPy OSS team at Databricks and beyond is excited to present DSPy 3.0, targeted for release close to DAIS 2025. We will present what DSPy is and how it evolved over the past year. We will discuss greatly improved prompt optimization and finetuning/RL capabilities, improved productionization and observability via thorough and native integration with MLflow, and lessons from usage of DSPy in various Databricks R&D and professional services contexts.

Elevate SQL Productivity: The Power of Notebooks and SQL Editor

2025-06-12 Watch

talk

Jason Messer (Databricks)

Databricks SQL

Writing SQL is a core part of any data analyst’s workflow, but small inefficiencies can add up, slowing down analysis and making it harder to iterate quickly. In this session, we’ll explore our powerful features in the Databricks SQL editor and notebook that help you to be more productive when writing SQL on Databricks. We’ll demo the new features and the customer use cases that inspired them.

Embracing Unity Catalog and Empowering Innovation With Genie Room

2025-06-12 Watch

talk

Junghoon Lee (Bagelcode) , Soochang Chung (Bagelcode)

Databricks Delta ETL/ELT Hive

Bagelcode, a leader in the social casino industry, has utilized Databricks since 2018 and manages over 10,000 tables via Hive Metastore. In 2024, we embarked on a transformative journey to resolve inefficiencies and unlock new capabilities. Over five months, we redesigned ETL pipelines with Delta Lake, optimized partitioned table logs and executed a seamless migration with minimal disruption. This effort improved governance, simplified management and unlocked Unity Catalog’s advanced features. Post-migration, we integrated the Genie Room with Slack to enable natural language queries, accelerating decision-making and operational efficiency. Additionally, a lineage-powered internal tool allowed us to quickly identify and resolve issues like backfill needs or data contamination. Unity Catalog has revolutionized our data ecosystem, elevating governance and innovation. Join us to learn how Bagelcode unlocked its data’s full potential and discover strategies for your own transformation.

From 10 Hours to 10 Minutes: Unleashing the Power of Lakeflow Declarative Pipelines

2025-06-12

talk

Sidney Cardoso (Michelin) , Yash Joshi (Accenture)

Analytics Azure ADF BI Data Quality Databricks

How do you transform a data pipeline from sluggish 10-hour batch processing into a real-time powerhouse that delivers insights in just 10 minutes? This was the challenge we tackled at one of France's largest manufacturing companies, where data integration and analytics were mission-critical for supply chain optimization. Power BI dashboards needed to refresh every 15 minutes. Our team struggled with legacy Azure Data Factory batch pipelines. These outdated processes couldn’t keep up, delaying insights and generating up to three daily incident tickets. We identified Lakeflow Declarative Pipelines and Databricks SQL as the game-changing solution to modernize our workflow, implement quality checks, and reduce processing times.In this session, we’ll dive into the key factors behind our success: Pipeline modernization with Lakeflow Declarative Pipelines: improving scalability Data quality enforcement: clean, reliable datasets Seamless BI integration: Using Databricks SQL to power fast, efficient queries in Power BI

Get the Most of Your Delta Lake

2025-06-12 Watch

lightning_talk

Youssef Mrini (Databricks)

Analytics Data Lakehouse Data Management Delta Spark

Unlock the full potential of Delta Lake, the open-source storage framework for Apache Spark, with this session focused on its latest and most impactful features. Discover how capabilities like Time Travel, Column Mapping, Deletion Vectors, Liquid Clustering, UniForm interoperability, and Change Data Feed (CDF) can transform your data architecture. Learn not just what these features do, but when and how to use them to maximize performance, simplify data management, and enable advanced analytics across your lakehouse environment.

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

2025-06-12 Watch

talk

Tim Kessler (Redox, Inc.) , Matthew Giglia (Databricks)

AI/ML API Amazon EMR BI Data Lakehouse Databricks

Redox & Databricks direct integration can streamline your interoperability workflows from responding in record time to preauthorization requests to letting attending physicians know about a change in risk for sepsis and readmission in near real time from ADTs. Data engineers will learn how to create fully-streaming ETL pipelines for ingesting, parsing and acting on insights from Redox FHIR bundles delivered directly to Unity Catalog volumes. Once available in the Lakehouse, AI/BI Dashboards and Agentic Frameworks help write FHIR messages back to Redox for direct push down to EMR systems. Parsing FHIR bundle resources has never been easier with SQL combined with the new VARIANT data type in Delta and streaming table creation against Serverless DBSQL Warehouses. We'll also use Databricks accelerators dbignite and redoxwrite for writing and posting FHIR bundles back to Redox integrated EMRs and we'll extend AI/BI with Unity Catalog SQL UDFs and the Redox API for use in Genie.

How Navy Federal's Enterprise Data Ecosystem Leverages Unity Catalog for Data + AI Governance

2025-06-12 Watch

talk

Krishnakumar Sivasubramanian (NFCU) , Ricardo Portilla (Databricks)

AI/ML Cloud Computing Data Engineering Data Lake DWH

Navy Federal Credit Union has 200+ enterprise data sources in the enterprise data lake. These data assets are used for training 100+ machine learning models and hydrating a semantic layer for serving, at an average 4,000 business users daily across the credit union. The only option for extracting data from analytic semantic layer was to allow consuming application to access it via an already-overloaded cloud data warehouse. Visualizing data lineage for 1,000 + data pipelines and associated metadata is impossible and understanding the granular cost for running data pipelines is a challenge. Implementing Unity Catalog opened alternate path for accessing analytic semantic data from lake. It also opened the doors to remove duplicate data assets stored across multiple lakes which will save hundred thousands of dollars in data engineering efforts, compute and storage costs.

How to Migrate From Snowflake to Databricks SQL

2025-06-12 Watch

talk

Koundinya Srinivasarao (Databricks) , Matt Holzapfel (Databricks)

Cloud Computing Databricks DWH ETL/ELT Snowflake SQL

Migrating your Snowflake data warehouse to the Databricks Data Intelligence Platform can accelerate your data modernization journey. Though a cloud platform-to-cloud platform migration should be relatively easy, the breadth of the Databricks Platform provides flexibility and hence requires careful planning and execution. In this session, we present the migration methodology, technical approaches, automation tools, product/feature mapping, a technical demo and best practices using real-world case studies for migrating data, ELT pipelines and warehouses from Snowflake to Databricks.

Incremental Iceberg Table Replication at Scale

2025-06-12 Watch

talk

Hongyue Hongyue (Self-Employed) , Szehon Ho (Databricks)

Iceberg Spark

Apache Iceberg is a popular table format for managing large analytical datasets. But replicating iceberg tables at scale can be a daunting task — especially when dealing with its hierarchical metadata. In this talk, we present an end-to-end workflow for replicating Apache Iceberg tables, leveraging Apache Spark to ensure that backup tables remain identical to their source counterparts. More excitingly, we have contributed these libraries back to the open-source community. Attendees will gain a comprehensive understanding of how to set up replication workflows for Iceberg tables, as well as practical guidance on how to manage and maintain replicated datasets at scale. This talk is ideal for data engineers, platform architects and practitioners looking to apply replication and disaster recovery for Apache Iceberg in complex data ecosystems.

Introducing Simplified State Tracking in Apache Spark™ Structured Streaming

2025-06-12 Watch

lightning_talk

Craig Lukasik (Databricks)

API Spark Data Streaming

This presentation will review the new change feed and snapshot capabilities in Apache Spark™ Structured Streaming’s State Reader API. The State Reader API enables users to access and analyze Structured Streaming's internal state data. Readers will learn how to leverage the new features to debug, troubleshoot and analyze state changes efficiently, making streaming workloads easier to manage at scale.

Lakeflow in Production: CI/CD, Testing and Monitoring at Scale

2025-06-12 Watch

talk

Adriana Ispas (Databricks) , Lennart Kats (Databricks)

CI/CD Data Engineering Data Quality Databricks Git

Building robust, production-grade data pipelines goes beyond writing transformation logic — it requires rigorous testing, version control, automated CI/CD workflows and a clear separation between development and production. In this talk, we’ll demonstrate how Lakeflow, paired with Databricks Asset Bundles (DABs), enables Git-based workflows, automated deployments and comprehensive testing for data engineering projects. We’ll share best practices for unit testing, CI/CD automation, data quality monitoring and environment-specific configurations. Additionally, we’ll explore observability techniques and performance tuning to ensure your pipelines are scalable, maintainable and production-ready.

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

2025-06-12 Watch

lightning_talk

Alex Dean (Snowplow)

AI/ML Analytics Data Collection Data Lakehouse Databricks Delta

In the competitive gaming industry, understanding player behavior is key to delivering engaging experiences. Supercell, creators of Clash of Clans and Brawl Stars, faced challenges with fragmented data and limited visibility into user journeys. To address this, they partnered with Snowplow and Databricks to build a scalable, privacy-compliant data platform for real-time insights. By leveraging Snowplow’s behavioral data collection and Databricks’ Lakehouse architecture, Supercell achieved: Cross-platform data unification: A unified view of player actions across web, mobile and in-game Real-time analytics: Streaming event data into Delta Lake for dynamic game balancing and engagement Scalable infrastructure: Supporting terabytes of data during launches and live events AI & ML use cases: Churn prediction and personalized in-game recommendations This session explores Supercell’s data journey and AI-driven player engagement strategies.

Marketing Runs on Your Data: Why IT Holds the Keys to Customer Growth

2025-06-12 Watch

lightning_talk

Tim Haden (Epsilon)

Databricks Marketing

Marketing owns the outcomes, but IT owns the infrastructure that makes those outcomes possible. In today’s data-driven landscape, the success of customer engagement and personalization strategies depends on a tight partnership between marketing and IT. This session explores how leading brands are using Databricks and Epsilon to unlock the full value of first-party data — transforming raw data into rich customer profiles, real-time engagement and measurable marketing ROI. Join Epsilon to see how a unified data foundation powers marketing to drive outcomes — with IT as the enabler of scale, governance and innovation. Key takeaways: How to unify first-party data and resolve identities to build rich customer profiles with Databricks and Epsilon Why a collaborative approach between Marketing and IT accelerates data-driven decisions and drives greater return How to activate personalized campaigns with precision and speed across channels — from insights to execution

Measure What Matters: Quality-Focused Monitoring for Production AI Agents

2025-06-12 Watch

talk

Eric Peter (Databricks) , Niall Turbitt (Databricks)

AI/ML Dashboard Databricks GenAI

Ensuring the operational excellence of AI agents in production requires robust monitoring capabilities that span both performance metrics and quality evaluation. This session explores Databricks' comprehensive Mosaic Agent Monitoring solution, designed to provide visibility into deployed AI agents through an intuitive dashboard that tracks critical operational metrics and quality indicators. We'll demonstrate how to use the Agent Monitoring solution to iteratively improve a production agent that delivers a better customer support experience while decreasing the cost of delivering customer support. We will show how to: Identify and proactively fix a quality problem with the GenAI agent’s response before it becomes a major issue. Understand user’s usage patterns and implement/test an feature improvement to the GenAI agent Key session takeaways include: Techniques for monitoring essential operational metrics, including request volume, latency, errors, and cost efficiency across your AI agent deployments Strategies for implementing continuous quality evaluation using AI judges that assess correctness, guideline adherence, and safety without requiring ground truth labels Best practices for setting up effective monitoring dashboards that enable dimension-based analysis across time periods, user feedback, and topic categories Methods for collecting and integrating end-user feedback to create a closed-loop system that drives iterative improvement of your AI agents

Optimizing EV Charging Experience: Machine Learning for Accurate Charge Time Estimation

2025-06-12 Watch

talk

Sihang Chen (Rivian) , Mohammed Farag (Rivian Automotive, LLC)

AI/ML Data Governance Delta

Accurate charge time estimation is key to vehicle performance and user experience. We developed a scalable ML model that enhances real-time charge predictions in vehicle controls. Traditional rule-based methods struggle with dynamic factors like environment, vehicle state, and charging conditions. Our adaptive ML solution improves accuracy by 10%. We use Unity Catalog for data governance, Delta Tables for storage, and Liquid Clustering for data layout. Job schedulers manage data processing, while AutoML accelerates model selection. MLflow streamlines tracking, versioning, and deployment. A dedicated serving endpoint enables A/B testing and real-time insights. As our data ecosystem grew, scalability became critical. Our flexible ML framework was integrated into vehicle control systems within months. With live accuracy tracking and software-driven blending, we support 50,000+ weekly charge sessions, improving energy management and user experience.

Optimizing Smart Meter IIoT Data in Databricks for At-Scale Interactive Electrical Load Analytics

2025-06-12 Watch

talk

David Gibbon (Plotly)

Analytics Databricks ETL/ELT Plotly Cyber Security

Octave is a Plotly Dash application used daily by about 1,000 Hydro-Québec technicians and engineers to analyze smart meter load and voltage data from 4.5M meters across the province. As adoption grew, Octave’s back end was migrated to Databricks to address increasingly massive scale (>1T data points), governance and security requirements. This talk will summarize how Databricks was optimized to support performant at-scale interactive Dash application experiences while in parallel managing complex back-end ETL processes. The talk will outline optimizations targeted to further optimize query latency and user concurrency, along with plans to increase data update frequency. Non-technology related success factors to be reviewed will include the value of: subject matter expertise, operational autonomy, code quality for long-term maintainability and proactive vendor technical support.

Powering Secure and Scalable Data Governance at PepsiCo With Unity Catalog Open APIs

2025-06-12 Watch

talk

Dipankar Kushari (Databricks) , Sudipta Das (PepsiCo)

AI/ML Analytics API Data Governance Databricks

PepsiCo, given its scale, has numerous teams leveraging different tools and engines to access data and perform analytics and AI. To streamline governance across this diverse ecosystem, PepsiCo unifies its data and AI assets under an open and enterprise-grade governance framework with Unity Catalog. In this session, we'll explore real-world examples of how PepsiCo extends Unity Catalog’s governance to all its data and AI assets, enabling secure collaboration even for teams outside Databricks. Learn how PepsiCo architects permissions using service principals and service accounts to authenticate with Unity Catalog, building a multi-engine architecture with seamless and open governance. Attendees will gain practical insights into designing a scalable, flexible data platform that unifies governance across all teams while embracing openness and interoperability.

Real-Time Botnet Defense at CVS: AI-Driven Detection and Mitigation on Databricks

2025-06-12 Watch

talk

Virender Dhiman (CVS Health) , andrew HintonA (CVS Health)

AI/ML Databricks MLOps Cyber Security

Botnet attacks mobilize digital armies of compromised devices that continuously evolve, challenging traditional security frameworks with their high-speed, high-volume nature. In this session, we will reveal our advanced system — developed on the Databricks platform — that leverages cutting-edge AI/ML capabilities to detect and mitigate bot attacks in near-real time. We will dive into the system’s robust architecture, including scalable data ingestion, feature engineering, MLOps strategies & production deployment of the system. We will address the unique challenges of processing bulk HTTP traffic data, time-series anomaly detection and attack signature identification. We will demonstrate key business values through downtime minimization and threat response automation. With sectors like healthcare facing heightened risks, ensuring data integrity and service continuity is vital. Join us to uncover lessons learned while building an enterprise-grade solution that stays ahead of adversaries.

Real-World Impact in Healthcare: How VUMC’s Enterprise Data Platform Supports Patient Care and Leading-Edge Research

2025-06-12 Watch

talk

Peter Shave (Vanderbilt University Medical Center) , Aaron Zavora (Databricks)

Analytics

Vanderbilt University Medical Center (VUMC) stands at the forefront of health informatics, harnessing the power of data to redefine patient care and make healthcare personal. Join us as we explore how VUMC enables operational and strategic analytics, supports research, and ultimately drives insights into clinical workflow in and around the Epic EHR platform.

talk-data.com

Top Topics

Top Speakers

Beyond AI Accuracy: Building Trustworthy and Responsible AI Application Through Mosaic AI Framework

Bridging BI Tools: Deep Dive Into AI/BI Dashboards for Power BI Practitioners

Building Responsible AI Agents on Databricks

Collaborative Innovation: How to Spur Innovation While Driving Efficiency

Databricks in Action: Azure’s Blueprint for Secure and Cost-Effective Operations

Democratizing Data Engineering with Databricks and dbt at Ludia

DSPy 3.0 — and DSPy at Databricks

Elevate SQL Productivity: The Power of Notebooks and SQL Editor

Embracing Unity Catalog and Empowering Innovation With Genie Room

From 10 Hours to 10 Minutes: Unleashing the Power of Lakeflow Declarative Pipelines

Get the Most of Your Delta Lake

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

How Navy Federal's Enterprise Data Ecosystem Leverages Unity Catalog for Data + AI Governance

How to Migrate From Snowflake to Databricks SQL

Incremental Iceberg Table Replication at Scale

Introducing Simplified State Tracking in Apache Spark™ Structured Streaming

Lakeflow in Production: CI/CD, Testing and Monitoring at Scale

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

Marketing Runs on Your Data: Why IT Holds the Keys to Customer Growth

Measure What Matters: Quality-Focused Monitoring for Production AI Agents

Optimizing EV Charging Experience: Machine Learning for Accurate Charge Time Estimation

Optimizing Smart Meter IIoT Data in Databricks for At-Scale Interactive Electrical Load Analytics

Powering Secure and Scalable Data Governance at PepsiCo With Unity Catalog Open APIs

Real-Time Botnet Defense at CVS: AI-Driven Detection and Mitigation on Databricks

Real-World Impact in Healthcare: How VUMC’s Enterprise Data Platform Supports Patient Care and Leading-Edge Research