Data + AI Summit 2025

A Prescription for Success: Leveraging DABs for Faster Deployment and Better Patient Outcomes

2025-06-10 Watch

talk

Brendon Allen (Health Catalyst) , Alex Owen (Databricks)

Analytics CI/CD Databricks

Health Catalyst (HCAT) transformed its CI/CD strategy by replacing a rigid, internal deployment tool with Databricks Asset Bundles (DABs), unlocking greater agility and efficiency. This shift streamlined deployments across both customer workspaces and HCAT's core platform, accelerating time to insights and driving continuous innovation. By adopting DABs, HCAT ensures feature parity, standardizes metric stores across clients, and rapidly delivers tailored analytics solutions. Attendees will gain practical insights into modernizing CI/CD pipelines for healthcare analytics, leveraging Databricks to scale data-driven improvements. HCAT's next-generation platform, Health Catalyst Ignite™, integrates healthcare-specific data models, self-service analytics, and domain expertise—powering faster, smarter decision-making.

Building Real-time Trading Dashboards with Lakeflow Declarative Pipelines, Serverless OLTP and Databricks Apps

2025-06-10

talk

Matt Slack (Databricks) , Matthew Moorcroft (Databricks)

Databricks Java Data Streaming

Barclays Post Trade real-time trade monitoring platform was historically built on a complex set of legacy technologies including Java, Solace, and custom micro-services.This session will demonstrate how the power of Lakeflow Declarative Pipelines' new real-time mode, in conjunction with the foreach_batch_sink, can enable simple, cost-effective streaming pipelines that can load high volumes of data into Databricks new Serverless OLTP database with very low latency.Once in our OLTP database, this can be used to update real-time trading dashboards, securely hosted in Databricks Apps, with the latest stock trades - enabling better, more responsive decision-making and alerting.The session will walk-through the architecture, and demonstrate how simple it is to create and manage the pipelines and apps within the Databricks environment.

Cross-Region AI Model Deployment for Resiliency and Compliance

2025-06-10 Watch

talk

Greg Wood (Databricks) , Tony Farias (Databricks)

AI/ML Cloud Computing GenAI Virtual Machine

AI for enterprises, particularly in the era of GenAI, requires rapid experimentation and the ability to productionize models and agents quickly and at scale. Compliance, resilience and commercial flexibility drive the need to serve models across regions. As cloud providers struggle with rising demand for GPUs in environments, VM shortages have become commonplace, and add to the pressure of general cloud outages. Enterprises that can quickly leverage GPU capacity in other cloud regions will be better equipped to capitalize on the promise of AI, while staying flexible to serve distinct user bases and complying with regulations. In this presentation we will show and discuss how to implement AI deployments across cloud regions, deploying a model across regions and using a load balancer to determine where to best route a user request.

Databricks on Databricks: Powering Marketing Insights with Lakehouse

2025-06-10 Watch

talk

Elizabeth Dobbs (Databricks) , Anoop Muraleedharan (Databricks)

Analytics CDP Data Lakehouse Databricks GenAI Marketing

This presentation outlines the evolution of our marketing data strategy, focusing on how we’ve built a strong foundation using the Databricks Lakehouse. We will explore key advancements across data ingestion, strategy, and insights, highlighting the transition from legacy systems to a more scalable and intelligent infrastructure. Through real-world applications, we will showcase how unified Customer 360 insights drive personalization, predictive analytics enhance campaign effectiveness, and GenAI optimizes content creation and marketing execution. Looking ahead, we will demonstrate the next phase of our CDP, the shift toward an end-user-first analytics model powered by AIBI, Genie and Matik, and the growing importance of clean rooms for secure data collaboration. This is just the beginning, and we are poised to unlock even greater capabilities in the future.

From Largest to Best: How We Transformed Databricks’ Biggest Workspace With Unity Catalog

2025-06-10 Watch

talk

Donghan Zhang (Databricks) , Li Yang (Databricks)

Data Lakehouse Databricks

Join us as we unveil how we transformed the largest Databricks workspace into the best-in-class lakehouse through Unity Catalog. Discover how we harnessed lineage and unified access management to build ultimate governance automation.

Highways and Hexagons: Processing Large Geospatial Datasets With H3

2025-06-10 Watch

talk

Olivia Ren (Databricks) , Petr Andreev (Mantel Group)

Data Engineering Spark

The problem of matching GPS locations to roads and local government areas (LGAs) involves handling large datasets and a number of geospatial operations. In this deep dive, we will outline the challenges of developing scalable solutions for these tasks. We will discuss our multi-step approach, first focusing on the use of H3 indexing to isolate matches with single candidates, then explaining use of different geospatial computational techniques to accurately match points with multiple candidates. From technical perspective, the talk will showcase the use of broadcasting and partitioning techniques, their effect on autoscaling, memory usage and effective data parallelization. This session is for anyone interested in geospatial data, spark performance optimization and the real-world challenges of large-scale data engineering.

Introduction to Modern Open Table Formats and Catalogs

2025-06-10 Watch

talk

Bart Samwel (Databricks) , Sirui Sun (Databricks)

AI/ML Delta Iceberg

In this session, learn about why modern open table formats like Delta and Iceberg are a big deal and how they work with catalogs. Learn about what motivated their creation, how they work, what benefits they can bring to your data and AI platform. Hear about how these formats are becoming increasingly interoperable and what our vision is for their future.

Managing the Governed Cloud

2025-06-10 Watch

talk

Sherri Adame (GM) , Johnathan Powell (General Motors)

AI/ML Analytics Cloud Computing Data Governance Databricks Cyber Security

As organizations increasingly adopt Databricks as a unified platform for analytics and AI, ensuring robust data governance becomes critical for compliance, security, and operational efficiency. This presentation will explore the end-to-end framework for governing the Databricks cloud, covering key use cases, foundational governance principles, and scalable automation strategies. We will discuss best practices for metadata, data access, catalog, classification, quality, and lineage, while leveraging automation to streamline enforcement. Attendees will gain insights into best practices and real-world approaches to building a governed data cloud that balances innovation with control.

Real-Time Mode Technical Deep Dive: How We Built Sub-300 Millisecond Streaming Into Apache Spark™

2025-06-10 Watch

talk

Siying Dong (Databricks) , Jerry Peng (Databricks)

AI/ML Databricks Spark SQL Data Streaming

Real-time mode is a new low-latency execution mode for Apache Spark™ Structured Streaming. It can consistently provide p99 latencies less than 300 milliseconds for a broad set of stateless and stateful streaming queries. Our talk focuses on the technical aspects of making this possible in Spark. We’ll dive into the core architecture that enables these dramatic latency improvements, including a concurrent stage scheduler and a non-blocking shuffle. We’ll explore how we maintained Spark’s fault-tolerance guarantees, and we’ll also share specific optimizations we made to our streaming SQL operators. These architectural improvements have already enabled Databricks customers to build workloads with latencies up to 10x lower than before. Early adopters in our Private Preview have successfully implemented real-time enrichment pipelines and feature engineering for machine learning — use cases that were previously impossible at these latencies.

RecSys, Topic Modeling and Agents: Bridging the GenAI-Traditional ML Divide

2025-06-10 Watch

talk

Dan Pechi (Databricks)

AI/ML GenAI LLM

The rise of GenAI has led to a complete reinvention of how we conceptualize Data + AI. In this breakout, we will recontextualize the rise of GenAI in traditional ML paradigms, and hopefully unite the pre- and post-LLM eras. We will demonstrate when and where GenAI may prove more effective than traditional ML algorithms, and highlight problems for which the wheel is unnecessarily being reinvented with GenAI. This session will also highlight how MLflow provides a unified means of benchmarking traditional ML against GenAI, and lay out a vision for bridging the divide between Traditional ML and GenAI practitioners.

Revolutionizing Nuclear AI With HiVE and Bertha on Databricks Architecture

2025-06-10 Watch

talk

Lou Martinez Sancho (Westinghouse Electric Company)

AI/ML Databricks GenAI Hive

In this session we will explore the revolutionary advancements in nuclear AI capabilities with HiVE and Bertha on Databricks architecture. HiVE, developed by Westinghouse, leverages over a century of proprietary data to deliver unparalleled AI capabilities. At its core is Bertha, a generative AI model designed to tackle the unique challenges of the nuclear industry. This session will delve into the technical architecture of HiVE and Bertha, showcasing how Databricks' scalable environment enhances their performance. We will discuss the secure data infrastructure supporting HiVE, ensuring data integrity and compliance. Real-world applications and use cases will demonstrate the impact of HiVE and Bertha on improving efficiency, innovation and safety in nuclear operations. Discover how the fusion of HiVE and Bertha with Databricks architecture is transforming the nuclear AI landscape and driving the future of nuclear technology.

Saving Millions From Millions: Navigating Towards Cost-Efficiency in Pinterest's Spark Jobs

2025-06-10 Watch

talk

Nan Zhu (Pinterest)

Spark

While Spark offers powerful processing capabilities for massive data volumes, cost-efficiency challenges are always bothering users operating at large scales. At Pinterest, where we run millions of Spark jobs monthly, maintaining infra cost efficiency is crucial to support our rapid business growth. To tackle this challenge, we have developed several strategies that have saved us tens of millions of dollars across numerous job instances. We will share our analytical methodology for identifying performance bottlenecks, and the technical solutions to overcome various challenges. Our approach includes extracting insights from billions of collected metrics, leveraging remote shuffle services to address shuffle slowness and improve memory utilization and reduce costs while hosting hundreds of millions of pods. The presentation aims to trigger more discussions about cost efficiency topics of Apache Spark in the community and help the community to tackle the common challenge.

Serverless Compute for Notebooks, Jobs and Lakeflow Declarative Pipelines

2025-06-10 Watch

talk

Roland Fäustlin (Databricks)

Databricks

Discover how Databricks serverless compute revolutionizes data workflows by eliminating infrastructure management, enabling rapid scaling and optimizing costs for Notebooks, Jobs and Lakeflow Declarative Pipelines. This session will delve into the serverless architecture, highlighting its ability to dynamically allocate resources, reduce idle costs and simplify development cycles. Learn about recent advancements, including cost savings and practical strategies for migration and optimization. Tailored for Data Engineers and Architects, this talk will also explore use cases, features, limitations and future roadmap, empowering you to make informed infrastructure decisions while unlocking the full potential of Databricks’ serverless capabilities.

Shifting Left — Setting up Your GenAI Ecosystem to Work for Business Analysts

2025-06-10 Watch

talk

James Lin (Experian)

AI/ML Analytics BI Data Science Databricks GenAI

At Data and AI in 2022, Databricks pioneered the term to shift left in how AI workloads would enable less data science driven people to create their own apps. In 2025, we take a look at how Experian is doing on that journey. This session highlights Databricks services that assist with the shift left paradigm for Generative AI, including how AI/BI Genie helps with Generative analytics, and how Agent Studio helps with synthetic generation of test cases to validate model performance.

Streaming Meets Governance: Building AI-Ready Tables With Confluent Tableflow and Unity Catalog

2025-06-10 Watch

talk

Kasun Indrasiri Gamage (Confluent) , Victoria Bukta (Databricks)

AI/ML Analytics Data Governance Databricks Delta Kafka

Learn how Databricks and Confluent are simplifying the path from real-time data to governed, analytics- and AI-ready tables. This session will cover how Confluent Tableflow automatically materializes Kafka topics into Delta tables and registers them with Unity Catalog — eliminating the need for custom streaming pipelines. We’ll walk through how this integration helps data engineers reduce ingestion complexity, enforce data governance and make real-time data immediately usable for analytics and AI.

The Future of DSv2 in Apache Spark™

2025-06-10 Watch

talk

Anton Okolnychyi (Databricks)

API Spark

DSv2, Spark's next-generation Catalog API, is gaining traction among data source developers. It shifts complexity to Apache Spark™, improves connector reliability and unlocks new functionality such as catalog federation, MERGE operations, storage-partitioned joins, aggregate pushdown, stored procedures and more. This session covers the design of DSv2, current strengths and gaps and its evolving roadmap. It's intended for Spark users and developers working with data sources, whether custom-built or off-the-shelf.

The Next Wave of AI Applications Driven by Agentic Workflow at Adidas Using Databricks

2025-06-10

talk

Joana Ferreira (Adidas AG) , Mahavir Teraiya (Databricks)

AI/ML Databricks GenAI LLM

Curious to know how Adidas is transforming customer experience and business impact with agentic workflows, powered by Databricks? By leveraging cutting-edge tools like MosaicML’s deployment capabilities, Mosaic AI Gateway, and MLflow, Adidas built a scalable GenAI agentic infrastructure that delivers actionable insights from growing 2 million product reviews annually. With remarkable results: 60% latency reduction (15.5 seconds to 6 seconds) 91.67% cost savings (transitioning to more efficient LLMs) 98.5% token efficiency, reducing input tokens from 200k to just 3k 20% increase in productivity (faster time to insight) Empowering over 500 decision-makers across 150+ countries, this infrastructure is set to optimize products and services for Adidas’ 500 million members by 2025 while supporting dozens of upcoming AI-driven solutions. Join us to explore how Adidas turned agentic workflows infra into a strategic advantage using Databricks and learn how you can do the same!

Unified Advanced Analytics: Integrating Power BI and Databricks Genie for Real-time Insights

2025-06-10 Watch

talk

Justin Ward (TurnPoint Services) , Edelweiss Kammermann (IT Convergence)

Analytics API Azure BI Dashboard Databricks

In today’s data-driven landscape, business users expect seamless, interactive analytics without having to switch between different environments. This presentation explores our web application that unifies a Power BI dashboard with Databricks Genie, allowing users to query and visualize insights from the same dataset within a single, cohesive interface. We will compare two integration strategies: one that leverages a traditional webpage enhanced by an Azure bot to incorporate Genie’s capabilities, and another that utilizes Databricks Apps to deliver a smoother, native experience. We use the Genie API to build this solution. Attendees will learn the architecture behind these solutions, key design considerations and challenges encountered during implementation. Join us to see live demos of both approaches, and discover best practices for delivering an all-in-one, interactive analytics experience.

Data Intelligence for Cybersecurity Forum: Insights From SAP, Anvilogic, Capital One, and Wiz

2025-06-10 Watch

talk

Jiong Liu (Wiz) , Hemanth Varma Kusampudi (SAP) , Anil Chamarthy (Capital One) , Mackenzie Kyle (Anvilogic)

AI/ML Cloud Computing Databricks Delta SAP Cyber Security

Join cybersecurity leaders from SAP, Anvilogic, Capital One, Wiz, and Databricks to explore how modern data intelligence is transforming security operations. Discover how SAP adopted a modular, AI-powered detection engineering lifecycle using Anvilogic on Databricks. Learn how Capital One built a detection and correlation engine leveraging Delta Lake, Apache Spark Streaming, and Databricks to process millions of cybersecurity events per second. Finally, see how Wiz and Databricks’ partnership enhances cloud security with seamless threat visibility. Through expert insights and live demos, gain strategies to build scalable, efficient cybersecurity powered by data and AI.

Financial Services Industry Forum: Shifting to Financial Intelligence | Sponsored by: Deloitte and AWS

2025-06-10 Watch

talk

Arsalan Tavakoli-Shiraji (Databricks) , Neema Raphael (Goldman Sachs) , Junta Nakai (Databricks) , Meggy Chung (Barclays Bank PLC)

AI/ML AWS Databricks

Overflow Available First come, first serve. No reservation required. Where: Moscone South, Level 3, Room 302 Join the 60-minute kickoff session at the Financial Services Forum to explore how data and AI transform finance. Featuring keynotes from top innovators in banking, capital markets, and insurance and exciting announcements from Databricks, this event offers invaluable insights. What to expect: Business imperatives: Learn how institutions drive growth, reduce risk and boost efficiency with data intelligence Transformative vision: See how Databricks enhances fraud prevention, automates processes and personalizes experiences Cutting-edge insights: Discover the latest trends in data-driven finance Customer success stories: Hear how global leaders leverage data and AI for a competitive edge Connect with C-suite executives and industry pioneers shaping financial services. Leave with actionable strategies to drive growth, ensure compliance and transform your organization through intelligence-driven decisions!

Games Industry Forum: The Games Executive Perspective on the Impact of Data and AI | Sponsored by: Sigma and AWS

2025-06-10 Watch

talk

Max Nienu (Databricks) , Dennis Ceccarelli (2K Games) , Brendan Noone (AWS) , Huntting Buckley (Databricks)

AI/ML AWS Databricks

Come hear from some of the biggest names in games about how Data and AI is helping them shape their future, build better games and create player-centric experiences. In this session you’ll hear, first, what Databricks is hearing from Games studios globally as their key priorities. We then shift to customers sharing their stories and perspectives. Dennis Ceccarelli from 2K Games sharing how his passion for building immersive, player-first experiences, has helped him shape the future of sports entertainment in the interactive space. You’ll leave invigorated on the impact Data and AI can have on games, and our global players and have new ideas on ways you can further your impact.

Healthcare and Life Sciences Industry Forum | Sponsored by: Accenture

2025-06-10 Watch

talk

Asheesh Chhabra (Merck & Co.) , Michael Sanky (Databricks) , BARBARA LATULIPPE (Takeda Pharmaceuticals - USA) , Jerry Thomas (Cencora)

AI/ML Databricks

Join us for an engaging 60-minute Healthcare and Life Sciences Industry Forum at the year’s premier Databricks event! You’ll hear directly from Databricks experts and industry leaders about how unifying data, governing AI models and empowering teams with data intelligence can drive meaningful change across the healthcare and life sciences continuum. Discover how data and AI are transforming the industry — helping streamline healthcare operations, personalize patient care and accelerate breakthroughs in research and development. Don’t miss this opportunity to learn about the future of data-driven healthcare.

Manufacturing and Transportation Industry Forum | Sponsored by: Deloitte and AWS

2025-06-10 Watch

talk

Victor Dsouza (Applied Materials) , Richard Masters (Virgin Atlantic Airways) , Andy Isenman (Heathrow) , Dr. Andrej Levin (Boston Consulting Group) , Shiv Trisal (Databricks) , Caitlin Gordon (Databricks)

AI/ML AWS Databricks GenAI

Join us for an inspiring forum showcasing how manufacturers and transportation leaders are turning today's challenges into tomorrow's opportunities. From automotive giants revolutionizing product development with generative AI to logistics providers optimizing routes for both cost and sustainability, discover how industry pioneers are reshaping the future of industrial operations. Highlighting this session is an exciting collaboration between Heathrow Airport and Virgin Atlantic, demonstrating how partnership and innovation are transforming the air travel experience. Learn how these leaders and other companies are using Databricks to tackle their most pressing challenges — from smart factory transformations to autonomous systems development — proving that the path to profitability and sustainability runs through intelligent operations.

talk-data.com

Top Topics

Top Speakers

A Prescription for Success: Leveraging DABs for Faster Deployment and Better Patient Outcomes

Building Real-time Trading Dashboards with Lakeflow Declarative Pipelines, Serverless OLTP and Databricks Apps

Cross-Region AI Model Deployment for Resiliency and Compliance

Databricks on Databricks: Powering Marketing Insights with Lakehouse

From Largest to Best: How We Transformed Databricks’ Biggest Workspace With Unity Catalog

Highways and Hexagons: Processing Large Geospatial Datasets With H3

Introduction to Modern Open Table Formats and Catalogs

Managing the Governed Cloud

Real-Time Mode Technical Deep Dive: How We Built Sub-300 Millisecond Streaming Into Apache Spark™

RecSys, Topic Modeling and Agents: Bridging the GenAI-Traditional ML Divide

Revolutionizing Nuclear AI With HiVE and Bertha on Databricks Architecture

Saving Millions From Millions: Navigating Towards Cost-Efficiency in Pinterest's Spark Jobs

Serverless Compute for Notebooks, Jobs and Lakeflow Declarative Pipelines

Shifting Left — Setting up Your GenAI Ecosystem to Work for Business Analysts

Sponsored by: Coalesce | From Raw Data to Real-Time Retention: Powering Customer Health Scores on Databricks

Sponsored by: ThoughtSpot | How Chevron Fuels Cloud Data Modernization

Streaming Meets Governance: Building AI-Ready Tables With Confluent Tableflow and Unity Catalog

The Future of DSv2 in Apache Spark™

The Next Wave of AI Applications Driven by Agentic Workflow at Adidas Using Databricks

Unified Advanced Analytics: Integrating Power BI and Databricks Genie for Real-time Insights

Data Intelligence for Cybersecurity Forum: Insights From SAP, Anvilogic, Capital One, and Wiz

Financial Services Industry Forum: Shifting to Financial Intelligence | Sponsored by: Deloitte and AWS

Games Industry Forum: The Games Executive Perspective on the Impact of Data and AI | Sponsored by: Sigma and AWS

Healthcare and Life Sciences Industry Forum | Sponsored by: Accenture

Manufacturing and Transportation Industry Forum | Sponsored by: Deloitte and AWS