Data + AI Summit 2025

Sponsored by: Firebolt | 10ms Queries on Iceberg: Turbocharging Your Lakehouse for Interactive Experiences with Firebolt

Using Clean Rooms for Privacy-Centric Data Collaboration

2025-06-11 Watch

talk

DJ Sharkey (Databricks) , Nikhil Gaekwad (Databricks)

AI/ML Analytics Databricks Delta Python SQL

Databricks Clean Rooms make privacy-safe collaboration possible for data, analytics, and AI — across clouds and platforms. Built on Delta Sharing, Clean Rooms enable organizations to securely share and analyze data together in a governed, isolated environment — without ever exposing raw data. In this session, you’ll learn how to get started with Databricks Clean Rooms and unlock advanced use cases including: Cross-platform collaboration and joint analytics Training machine learning and AI models Enforcing custom privacy policies Analyzing unstructured data Incorporating proprietary libraries in Python and SQL notebooks Auditing clean room activity for compliance Whether you're a data scientist, engineer or data leader, this session will equip you to drive high-value collaboration while maintaining full control over data privacy and governance.

What’s new with Collaboration: Delta Sharing, Clean Room, Marketplace and the Ecosystem

2025-06-11 Watch

talk

Tao Tao (Databricks) , Harish Gaur (Databricks)

AI/ML Data Lakehouse Databricks Delta

Databricks continues to redefine how organizations securely and openly collaborate on data. With new innovations like Clean Rooms for multi-party collaboration, Sharing for Lakehouse Federation, cross-platform view sharing and Databricks Apps in the Marketplace, teams can now share and access data more easily, cost-effectively and across platforms — whether or not they’re using Databricks. In this session, we’ll deliver live demos of key capabilities that power this transformation: Delta Sharing: The industry’s only open protocol for seamless cross-platform data sharing Databricks Marketplace: A central hub for discovering and monetizing data and AI assets Clean Rooms: A privacy-preserving solution for secure, multi-party data collaboration Join us to see how these tools enable trusted data sharing, accelerate insights and drive innovation across your ecosystem. Bring your questions and walk away with practical ways to put these capabilities into action today.

Advanced Data Access Control for the Exabyte Era: Scaling with Purpose

2025-06-11 Watch

talk

Arpan Ghosh (Databricks) , Shuting Zhang (Databricks)

Data Governance Databricks Delta Cyber Security

As data-driven companies scale from small startups to global enterprises, managing secure data access becomes increasingly complex. Traditional access control models fall short at enterprise scale, where dynamic, purpose-driven access is essential. In this talk, we explore how our “Just-in-Time” Purpose-Based Access Control (PBAC) platform addresses the evolving challenges of data privacy and compliance, maintaining least privilege while ensuring productivity. Using features like Unity Catalog, Delta Sharing & Databricks Apps, the platform delivers real-time, context-aware data governance. Leveraging JIT PBAC keeps your data secure, your engineers productive, your legal & security teams happy and your organization future-proof in the ever-evolving compliance landscape.

Delta and Databricks as a Performant Exabyte-Scale Application Backend

2025-06-11 Watch

lightning_talk

Scott Schenkein (Capital One Financial)

Databricks Delta DWH ETL/ELT NoSQL Cyber Security

The Delta Lake architecture promises to provide a single, highly functional, and high-scale copy of data that can be leveraged by a variety of tools to satisfy a broad range of use cases. To date, most use cases have focused on interactive data warehousing, ETL, model training, and streaming. Real-time access is generally delegated to costly and sometimes difficult-to-scale NoSQL, indexed storage, and domain-specific specialty solutions, which provide limited functionality compared to Spark on Delta Lake. In this session, we will explore the Delta data-skipping and optimization model and discuss how Capital One leveraged it along with Databricks photon and Spark Connect to implement a real-time web application backend. We’ll share how we built a highly-functional and performant security information and event management user experience (SIEM UX) that is cost effective.

Simplified Delta Sharing With Network Security

2025-06-11 Watch

networking

Krishna Puttaswamy (Databricks) , Samrat Ray (Databricks)

Cloud Computing Delta Cyber Security

Delta Sharing enables cross-domain sharing of data assets for collaboration. A practical concern providers and recipients face in doing so is the need to manually configure network and storage firewalls. This is particularly challenging for large-scale providers and recipients with strict compliance requirements. In this talk, we will describe our solution to fully eliminate these complexities. This enhances user experience, scalability and security, facilitating seamless data collaboration across diverse environments and cloud platforms.

Doordash Customer 360 Data Store and its Evolution to Become an Entity Management Framework

2025-06-10 Watch

lightning_talk

Gowri Shankar (Doordash) , Chao Wang (DoorDash)

Analytics Data Governance Delta Marketing

The "Doordash Customer 360 Data Store" represents a foundational step in centralizing and managing customer profile to enable targeting and personalized customer experiences built on Delta Lake. This presentation will explore the initial goals and architecture of the Customer 360 Data Store, its journey to becoming a robust entity management framework, and the challenges and opportunities encountered along the way. We will discuss how the evolution addressed scalability, data governance and integration needs, enabling the system to support dynamic and diverse use cases, including customer lifecycle analytics, marketing campaign targeting using segmentation. Attendees will gain insight into key design principles, technical innovations and strategic decisions that transformed the system into a flexible platform for entity management, positioning it as a critical enabler of data-driven growth at Doordash. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

No Time for the Dad Bod: Automating Life with AI and Databricks

2025-06-10 Watch

lightning_talk

Sean Falconer (Confluent)

AI/ML Data Management Databricks Delta GenAI

Life as a father, tech leader, and fitness enthusiast demands efficiency. To reclaim my time, I’ve built AI-driven solutions that automate everyday tasks—from research agents that prep for podcasts to multi-agent systems that plan meals—all powered by real-time data and automation. This session dives into the technical foundations of these solutions, focusing on event-driven agent design and scalable patterns for robust AI systems. You’ll discover how Databricks technologies like Delta Lake, for reliable and scalable data management, and DSPy, for streamlining the development of generative AI workflows, empower seamless decision-making and deliver actionable insights. Through detailed architecture diagrams and a live demo, I’ll showcase how to design systems that process data in motion to tackle complex, real-world problems. Whether you’re an engineer, architect, or data scientist, you’ll leave with practical strategies to integrate AI-driven automation into your workflows.

Introduction to Modern Open Table Formats and Catalogs

2025-06-10 Watch

talk

Bart Samwel (Databricks) , Sirui Sun (Databricks)

AI/ML Delta Iceberg

In this session, learn about why modern open table formats like Delta and Iceberg are a big deal and how they work with catalogs. Learn about what motivated their creation, how they work, what benefits they can bring to your data and AI platform. Hear about how these formats are becoming increasingly interoperable and what our vision is for their future.

Streaming Meets Governance: Building AI-Ready Tables With Confluent Tableflow and Unity Catalog

2025-06-10 Watch

talk

Kasun Indrasiri Gamage (Confluent) , Victoria Bukta (Databricks)

AI/ML Analytics Data Governance Databricks Delta Kafka

Learn how Databricks and Confluent are simplifying the path from real-time data to governed, analytics- and AI-ready tables. This session will cover how Confluent Tableflow automatically materializes Kafka topics into Delta tables and registers them with Unity Catalog — eliminating the need for custom streaming pipelines. We’ll walk through how this integration helps data engineers reduce ingestion complexity, enforce data governance and make real-time data immediately usable for analytics and AI.

Data Intelligence for Cybersecurity Forum: Insights From SAP, Anvilogic, Capital One, and Wiz

2025-06-10 Watch

talk

Jiong Liu (Wiz) , Hemanth Varma Kusampudi (SAP) , Anil Chamarthy (Capital One) , Mackenzie Kyle (Anvilogic)

AI/ML Cloud Computing Databricks Delta SAP Cyber Security

Join cybersecurity leaders from SAP, Anvilogic, Capital One, Wiz, and Databricks to explore how modern data intelligence is transforming security operations. Discover how SAP adopted a modular, AI-powered detection engineering lifecycle using Anvilogic on Databricks. Learn how Capital One built a detection and correlation engine leveraging Delta Lake, Apache Spark Streaming, and Databricks to process millions of cybersecurity events per second. Finally, see how Wiz and Databricks’ partnership enhances cloud security with seamless threat visibility. Through expert insights and live demos, gain strategies to build scalable, efficient cybersecurity powered by data and AI.

Kernel, Catalog, Action! Reimagining our Delta-Spark Connector with DSv2

2025-06-10 Watch

lightning_talk

Scott Sandre (Databricks)

API Data Lakehouse Delta Spark

Delta Lake is redesigning its Spark connector through the combination of three key technologies: First, we're updating our Spark APIs to DSv2 to achieve deeper catalog integration and improved integration with the Spark optimizer. Second, we're fully integrating on top of Delta Kernel to take advantage of its simplified abstraction of Delta protocol complexities, accelerating feature adoption and improving maintainability. Third, we are transforming Delta to become a catalog-aware lakehouse format with Catalog Commits, enabling more efficient metadata management, governance and query performance. Join us to explore how we're advancing Delta Lake's architecture, pushing the boundaries of metadata management and creating a more intelligent, performant data lakehouse platform.

Scaling Modern MDM With Databricks, Delta Sharing and Dun & Bradstreet

2025-06-10 Watch

lightning_talk

Anna Krayn (Dun & Bradstreet)

Data Management Databricks Delta Master Data Management

Master Data Management (MDM) is the foundation of a successful enterprise data strategy — delivering consistency, accuracy and trust across all systems that depend on reliable data. But how can organizations integrate trusted third-party data to enhance their MDM frameworks? How can they ensure that this master data is securely and efficiently shared across internal platforms and external ecosystems? This session explores how Dun & Bradstreet’s pre-mastered data serves as a single source of truth for customers, suppliers and vendors — reducing duplication and driving alignment across enterprise systems. With Delta Sharing, organizations can natively ingest Dun & Bradstreet data into their Databricks environment and establish a scalable, interoperable MDM framework. Delta Sharing also enables secure, real-time distribution of master data across the enterprise ensuring that every system operates from a consistent and trusted foundation.

AI-Driven Drug Discovery: Accelerating Molecular Insights With NVIDIA and Databricks

2025-06-10 Watch

talk

Karuna Nadadur (NVIDIA) , Srijit Chandrashekhar Nair (Databricks)

AI/ML Data Lakehouse Databricks Delta

This session is repeated. In the race to revolutionize healthcare and drug discovery, biopharma companies are turning to AI to streamline workflows and unlock new scientific insights. This session, we will explore how NVIDIA BioNeMo, combined with Databricks Delta Lakehouse, can be used for advancing drug discovery for critical applications like molecular structure modeling, protein folding and diagnostics. We’ll demonstrate how BioNeMo pre-trained models can run inference on data securely stored in Delta Lake, delivering actionable insights. By leveraging containerized solutions on Databricks’ ML Runtime with GPU acceleration, users can achieve significant performance gains compared to traditional CPU-based computation.

AI Powering Epsilon's Identity Strategy: Unified Marketing Platform on Databricks

2025-06-10 Watch

talk

Gairik Chakraborty (Epsilon Data Management) , Boaz Super (Epsilon Data Management)

AI/ML Cloud Computing Data Lakehouse Data Management Data Science Databricks

Join us to hear about how Epsilon Data Management migrated Epsilon’s unique, AI-powered marketing identity solution from multi-petabyte on-prem Hadoop and data warehouse systems to a unified Databricks Lakehouse platform. This transition enabled Epsilon to further scale its Decision Sciences solution and enable new cloud-based AI research capabilities on time and within budget, without being bottlenecked by the resource constraints of on-prem systems. Learn how Delta Lake, Unity Catalog, MLflow and LLM endpoints powered massive data volume, reduced data duplication, improved lineage visibility, accelerated Data Science and AI, and enabled new data to be immediately available for consumption by the entire Epsilon platform in a privacy-safe way. Using the Databricks platform as the base for AI and Data Science at global internet scale, Epsilon deploys marketing solutions across multiple cloud providers and multiple regions for many customers.

Cloud-to-Cloud Data Sharing by Walmart: Direct Access to Omni-Channel Sales Data With Delta Sharing

2025-06-10

talk

Roberto Robles Nacif (Walmart Data Ventures) , Ajay Bhonsule (Walmart Inc.)

Analytics API Cloud Computing Databricks Delta Omni

As first-party data becomes increasingly invaluable to organizations, Walmart Data Ventures is dedicated to bringing to life new applications of Walmart’s first-party data to better serve its customers. Through Scintilla, its integrated insights ecosystem, Walmart Data Ventures continues to expand its offerings to deliver insights and analytics that drive collaboration between our merchants, suppliers, and operators.Scintilla users can now access Walmart data using Cloud Feeds, based on Databricks Delta Sharing technologies. In the past, Walmart used API-based data sharing models, which required users to possess certain skills and technical attributes that weren’t always available. Now, with Cloud Feeds, Scintilla users can more easily access data without a dedicated technical team behind the scenes making it happen. Attendees will gain valuable insights into how Walmart has built its robust data sharing architecture and strategies to design scalable and collaborative data sharing architectures in their own organizations.

Delta Kernel for Rust and Java

2025-06-10 Watch

talk

Nick Lanham (Databricks)

AI/ML API C#/.NET ClickHouse Delta DuckDB

Delta Kernel makes it easy for engines and connectors to read and write Delta tables. It supports many Delta features and robust connectors, including DuckDB, Clickhouse, Spice AI and delta-dotnet. In this session, we'll cover lessons learned about how to build a high-performance library that lets engines integrate the way they want, while not having to worry about the details of the Delta protocol. We'll talk through how we streamlined the API as well as its changes and underlying motivations. We'll discuss some new highlight features like write support, and the ability to do CDF scans. Finally we'll cover the future roadmap for the Kernel project and what you can expect from the project over the coming year.

Genie for Engineering: Optimizing HVAC Design and Operational Insights With Data and AI

2025-06-10

talk

Mohamed Hanif Ansari (Lennox) , Sridhar Venkatesh (Lennox)

AI/ML Databricks Delta Spark

In this session, we will explore how Genie, an AI-driven platform transformed HVAC operational insights by leveraging Databricks offerings like Apache Spark, Delta Lake and the Databricks Data Intelligence Platform.Key contributions: Real-time data processing: Lakeflow Declarative Pipelines and Apache Spark™ for efficient data ingestion and real-time analysis. Workflow orchestration: Databricks Data Intelligence Platform to orchestrate complex workflows and integrate various data sources and analytical tools. Field Data Integration: Incorporating real-time field data into design and algorithm development, enabling engineers to make informed adjustments and optimize performance. By analyzing real-time data from HVAC installations, Genie identified discrepancies between design specs and field performance, allowing engineers to optimize algorithms, reduce inefficiencies and improve customer satisfaction. Discover how Genie revolutionized HVAC management and apply to your projects.

How HP Is Optimizing the 3D Printing Supply Chain Using Delta Sharing

2025-06-10 Watch

talk

javier.lagares javier.lagares (HP)

AI/ML Analytics BI Databricks Delta Cyber Security

HP’s 3D Print division empowers manufacturers with telemetry data to optimize operations and streamline maintenance. Using Delta Sharing, Unity Catalog and AI/BI dashboards, HP provides a secure, scalable solution for data sharing and analytics. Delta Sharing D2O enables seamless data access, even for customers not on Databricks. Apigee masks private URLs, and Unity Catalog enhances security by managing data assets. Predictive maintenance with Mosaic AI boosts uptime by identifying issues early and alerting support teams. Custom dashboards and sample code let customers run analytics using any supported client, while Apigee simplifies access by abstracting complexity. Insights from A/BI dashboards help HP refines data strategy, aligning solutions with customer needs despite the complexity of diverse technologies, fragmented systems and customer-specific requirements. This fosters trust, drives innovation,and strengthens HP as a trusted partner for scalable, secure data solutions.

Unifying Data Delivery: Using Databricks as Your Enterprise Serving Layer

2025-06-10 Watch

talk

Ivan Spiriev (The World Bank) , Ivan Donev (The World Bank)

Analytics Databricks Delta SQL

This session will take you on our journey of integrating Databricks as the core serving layer in a large enterprise, demonstrating how you can build a unified data platform that meets diverse business needs. We will walk through the steps for constructing a central serving layer by leveraging Databricks’ SQL Warehouse to efficiently deliver data to analytics tools and downstream applications. To tackle low latency requirements, we’ll show you how to incorporate an interim scalable relational database layer that delivers sub-second performance for hot data scenarios. Additionally, we’ll explore how Delta Sharing enables secure and cost-effective data distribution beyond your organization, eliminating silos and unnecessary duplication for a truly end-to-end centralized solution. This session is perfect for data architects, engineers and decision-makers looking to unlock the full potential of Databricks as a centralized serving hub.

Delta-rs Turning Five: Growing Pains and Life Lessons

2025-06-10 Watch

lightning_talk

Robert Pack (Databricks)

Delta Python Rust

Five years ago, the delta-rs project embarked on a journey to bring Delta Lake's robust capabilities to the Rust & Python ecosystem. In this talk, we'll delve into the triumphs, tribulations and lessons learned along the way. We'll explore how delta-rs has matured alongside the thriving Rust data ecosystem, adapting to its evolving landscape and overcoming the challenges of maintaining a complex data project. Join us as we share insights into the project's evolution, the symbiotic relationship between delta-rs and the Rust community, and the current hurdles and future directions that lie ahead. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Automated Deployment with Databricks Asset Bundles

2025-06-10

talk

API CI/CD Data Engineering Databricks DataOps Delta

This course provides a comprehensive review of DevOps principles and their application to Databricks projects. It begins with an overview of core DevOps, DataOps, continuous integration (CI), continuous deployment (CD), and testing, and explores how these principles can be applied to data engineering pipelines. The course then focuses on continuous deployment within the CI/CD process, examining tools like the Databricks REST API, SDK, and CLI for project deployment. You will learn about Databricks Asset Bundles (DABs) and how they fit into the CI/CD process. You’ll dive into their key components, folder structure, and how they streamline deployment across various target environments in Databricks. You will also learn how to add variables, modify, validate, deploy, and execute Databricks Asset Bundles for multiple environments with different configurations using the Databricks CLI. Finally, the course introduces Visual Studio Code as an Interactive Development Environment (IDE) for building, testing, and deploying Databricks Asset Bundles locally, optimizing your development process. The course concludes with an introduction to automating deployment pipelines using GitHub Actions to enhance the CI/CD workflow with Databricks Asset Bundles. By the end of this course, you will be equipped to automate Databricks project deployments with Databricks Asset Bundles, improving efficiency through DevOps practices. Pre-requisites: Strong knowledge of the Databricks platform, including experience with Databricks Workspaces, Apache Spark, Delta Lake, the Medallion Architecture, Unity Catalog, Delta Live Tables, and Workflows. In particular, knowledge of leveraging Expectations with Lakeflow Declarative Pipelines. Labs : Yes Certification Path: Databricks Certified Data Engineer Professional

De-Risking Investment Decisions: QCG's Smarter Deal Evaluation Process Leveraging Databricks

2025-06-10 Watch

lightning_talk

Ian Brown (Quantum Capital Group)

Analytics Databricks Delta Spark SQL

Quantum Capital Group (QCG) screens hundreds of deals across the global Sustainable Energy Ecosystem, requiring deep technical due diligence. With over 1.5 billion records sourced from public, premium and proprietary datasets, their challenge was how to efficiently curate, analyze and share this data to drive smarter investment decisions. QCG partnered with Databricks & Tiger Analytics to modernize its data landscape. Using Delta tables, Spark SQL, and Unity Catalog, the team built a golden dataset that powers proprietary evaluation models and automates complex workflows. Data is now seamlessly curated, enriched and distributed — both internally and to external stakeholders — in a secure, governed and scalable way. This session explores how QCG’s investment in data intelligence has turned an overwhelming volume of information into a competitive advantage, transforming deal evaluation into a faster, more strategic process.

Machine Learning Operations

2025-06-10

talk

AI/ML Data Lakehouse Databricks Delta MLOps Python

This course will guide participants through a comprehensive exploration of machine learning model operations, focusing on MLOps and model lifecycle management. The initial segment covers essential MLOps components and best practices, providing participants with a strong foundation for effectively operationalizing machine learning models. In the latter part of the course, we will delve into the basics of the model lifecycle, demonstrating how to navigate it seamlessly using the Model Registry in conjunction with the Unity Catalog for efficient model management. By the course's conclusion, participants will have gained practical insights and a well-rounded understanding of MLOps principles, equipped with the skills needed to navigate the intricate landscape of machine learning model operations. Pre-requisites: Familiarity with Databricks workspace and notebooks, familiarity with Delta Lake and Lakehouse, intermediate level knowledge of Python (e.g. understanding of basic MLOps concepts and practices as well as infrastructure and importance of monitoring MLOps solutions) Labs: Yes Certification Path: Databricks Certified Machine Learning Associate

talk-data.com

Top Topics

Top Speakers

Sponsored by: Firebolt | 10ms Queries on Iceberg: Turbocharging Your Lakehouse for Interactive Experiences with Firebolt

Sponsored by: SAP | SAP Business Data Cloud: Fuel AI with SAP data products across ERP and lines-of-business

Using Clean Rooms for Privacy-Centric Data Collaboration

What’s new with Collaboration: Delta Sharing, Clean Room, Marketplace and the Ecosystem

Advanced Data Access Control for the Exabyte Era: Scaling with Purpose

Delta and Databricks as a Performant Exabyte-Scale Application Backend

Simplified Delta Sharing With Network Security

Doordash Customer 360 Data Store and its Evolution to Become an Entity Management Framework

No Time for the Dad Bod: Automating Life with AI and Databricks

Introduction to Modern Open Table Formats and Catalogs

Streaming Meets Governance: Building AI-Ready Tables With Confluent Tableflow and Unity Catalog

Data Intelligence for Cybersecurity Forum: Insights From SAP, Anvilogic, Capital One, and Wiz

Kernel, Catalog, Action! Reimagining our Delta-Spark Connector with DSv2

Scaling Modern MDM With Databricks, Delta Sharing and Dun & Bradstreet

AI-Driven Drug Discovery: Accelerating Molecular Insights With NVIDIA and Databricks

AI Powering Epsilon's Identity Strategy: Unified Marketing Platform on Databricks

Cloud-to-Cloud Data Sharing by Walmart: Direct Access to Omni-Channel Sales Data With Delta Sharing

Delta Kernel for Rust and Java

Genie for Engineering: Optimizing HVAC Design and Operational Insights With Data and AI

How HP Is Optimizing the 3D Printing Supply Chain Using Delta Sharing

Unifying Data Delivery: Using Databricks as Your Enterprise Serving Layer

Delta-rs Turning Five: Growing Pains and Life Lessons

Automated Deployment with Databricks Asset Bundles

De-Risking Investment Decisions: QCG's Smarter Deal Evaluation Process Leveraging Databricks

Machine Learning Operations