Data + AI Summit 2025

Unified Governance and Enterprise Sharing for Data + AI

2025-06-11 Watch

talk

Molly Just-Behr (Databricks) , Suresh Kaudi (World Bank) , Luke Bilbro (Databricks) , Marcelo Diotto (Petrobras)

AI/ML Data Lakehouse Databricks

The Databricks Lakehouse for Public Sector is the only enterprise data platform that allows you to leverage all your data, from any source, on any workload to always offer better citizen services/warfighter support/student success with the best outcomes, at the lowest cost, with the greatest investment protection.

What’s new with Collaboration: Delta Sharing, Clean Room, Marketplace and the Ecosystem

2025-06-11 Watch

talk

Tao Tao (Databricks) , Harish Gaur (Databricks)

AI/ML Data Lakehouse Databricks Delta

Databricks continues to redefine how organizations securely and openly collaborate on data. With new innovations like Clean Rooms for multi-party collaboration, Sharing for Lakehouse Federation, cross-platform view sharing and Databricks Apps in the Marketplace, teams can now share and access data more easily, cost-effectively and across platforms — whether or not they’re using Databricks. In this session, we’ll deliver live demos of key capabilities that power this transformation: Delta Sharing: The industry’s only open protocol for seamless cross-platform data sharing Databricks Marketplace: A central hub for discovering and monetizing data and AI assets Clean Rooms: A privacy-preserving solution for secure, multi-party data collaboration Join us to see how these tools enable trusted data sharing, accelerate insights and drive innovation across your ecosystem. Bring your questions and walk away with practical ways to put these capabilities into action today.

Enterprise Financial Crime Detection: A Lakehouse Framework for FATF, Basel III, and BSA Compliance

2025-06-11 Watch

talk

Deepak Khetpal (Tiger Analytics) , Surya Sai Turaga (Databricks)

AI/ML Analytics Data Lakehouse Data Quality Databricks

We will present a framework for FinCrime detection leveraging Databricks lakehouse architecture specifically how institutions can achieve both data flexibility & ACID transaction guarantees essential for FinCrime monitoring. The framework incorporates advanced ML models for anomaly detection, pattern recognition, and predictive analytics, while maintaining clear data lineage & audit trails required by regulatory bodies. We will also discuss some specific improvements in reduction of false positives, improvement in detection speed, and faster regulatory reporting, delve deep into how the architecture addresses specific FATF recommendations, Basel III risk management requirements, and BSA compliance obligations, particularly in transaction monitoring and SAR. The ability to handle structured and unstructured data while maintaining data quality and governance makes it particularly valuable for large financial institutions dealing with complex, multi-jurisdictional compliance requirements.

Sponsored by: Onehouse | Open By Default, Fast By Design: One Lakehouse That Scales From BI to AI

Scaling Data Quality at Zillow: Migrating and Enhancing Data Quality Systems on Databricks

2025-06-10

lightning_talk

Laura Zhou (Zillow) , Firas Farah (Databricks)

Data Lakehouse Data Quality Databricks

Zillow has well-established, comprehensive systems for defining and enforcing data quality contracts and detecting anomalies.In this session, we will share how we evaluated Databricks’ native data quality features and why we chose Lakeflow Declarative Pipelines expectations for Lakeflow Declarative Pipelines, along with a combination of enforced constraints and self-defined queries for other job types. Our evaluation considered factors such as performance overhead, cost and scalability. We’ll highlight key improvements over our previous system and demonstrate how these choices have enabled Zillow to enforce scalable, production-grade data quality.Additionally, we are actively testing Databricks’ latest data quality innovations, including enhancements to lakehouse monitoring and the newly released DQX project from Databricks Labs.In summary, we will cover Zillow’s approach to data quality in the lakehouse, key lessons from our migration and actionable takeaways.

Sponsored by: Actian | Beyond the Lakehouse: Unlocking Enterprise-Wide AI-Ready Data with Unified Metadata Intelligence

Unlocking the Power of Iceberg: Our Journey to a Unified Lakehouse on Databricks

2025-06-10 Watch

lightning_talk

Tomer Sabag (LSports)

Data Engineering Data Lakehouse Databricks Iceberg

This session showcases our journey of adopting Apache Iceberg™ to build a modern lakehouse architecture and leveraging Databricks advanced Iceberg support to take it to the next level. We’ll dive into the key design principles behind our lakehouse, the operational challenges we tackled and how Databricks enabled us to unlock enhanced performance, scalability and streamlined data workflows. Whether you’re exploring Apache Iceberg™ or building a lakehouse on Databricks, this session offers actionable insights, lessons learned and best practices for modern data engineering.

Databricks on Databricks: Powering Marketing Insights with Lakehouse

2025-06-10 Watch

talk

Elizabeth Dobbs (Databricks) , Anoop Muraleedharan (Databricks)

Analytics CDP Data Lakehouse Databricks GenAI Marketing

This presentation outlines the evolution of our marketing data strategy, focusing on how we’ve built a strong foundation using the Databricks Lakehouse. We will explore key advancements across data ingestion, strategy, and insights, highlighting the transition from legacy systems to a more scalable and intelligent infrastructure. Through real-world applications, we will showcase how unified Customer 360 insights drive personalization, predictive analytics enhance campaign effectiveness, and GenAI optimizes content creation and marketing execution. Looking ahead, we will demonstrate the next phase of our CDP, the shift toward an end-user-first analytics model powered by AIBI, Genie and Matik, and the growing importance of clean rooms for secure data collaboration. This is just the beginning, and we are poised to unlock even greater capabilities in the future.

From Largest to Best: How We Transformed Databricks’ Biggest Workspace With Unity Catalog

2025-06-10 Watch

talk

Donghan Zhang (Databricks) , Li Yang (Databricks)

Data Lakehouse Databricks

Join us as we unveil how we transformed the largest Databricks workspace into the best-in-class lakehouse through Unity Catalog. Discover how we harnessed lineage and unified access management to build ultimate governance automation.

Kernel, Catalog, Action! Reimagining our Delta-Spark Connector with DSv2

2025-06-10 Watch

lightning_talk

Scott Sandre (Databricks)

API Data Lakehouse Delta Spark

Delta Lake is redesigning its Spark connector through the combination of three key technologies: First, we're updating our Spark APIs to DSv2 to achieve deeper catalog integration and improved integration with the Spark optimizer. Second, we're fully integrating on top of Delta Kernel to take advantage of its simplified abstraction of Delta protocol complexities, accelerating feature adoption and improving maintainability. Third, we are transforming Delta to become a catalog-aware lakehouse format with Catalog Commits, enabling more efficient metadata management, governance and query performance. Join us to explore how we're advancing Delta Lake's architecture, pushing the boundaries of metadata management and creating a more intelligent, performant data lakehouse platform.

AI-Driven Drug Discovery: Accelerating Molecular Insights With NVIDIA and Databricks

2025-06-10 Watch

talk

Karuna Nadadur (NVIDIA) , Srijit Chandrashekhar Nair (Databricks)

AI/ML Data Lakehouse Databricks Delta

This session is repeated. In the race to revolutionize healthcare and drug discovery, biopharma companies are turning to AI to streamline workflows and unlock new scientific insights. This session, we will explore how NVIDIA BioNeMo, combined with Databricks Delta Lakehouse, can be used for advancing drug discovery for critical applications like molecular structure modeling, protein folding and diagnostics. We’ll demonstrate how BioNeMo pre-trained models can run inference on data securely stored in Delta Lake, delivering actionable insights. By leveraging containerized solutions on Databricks’ ML Runtime with GPU acceleration, users can achieve significant performance gains compared to traditional CPU-based computation.

AI Powering Epsilon's Identity Strategy: Unified Marketing Platform on Databricks

2025-06-10 Watch

talk

Gairik Chakraborty (Epsilon Data Management) , Boaz Super (Epsilon Data Management)

AI/ML Cloud Computing Data Lakehouse Data Management Data Science Databricks

Join us to hear about how Epsilon Data Management migrated Epsilon’s unique, AI-powered marketing identity solution from multi-petabyte on-prem Hadoop and data warehouse systems to a unified Databricks Lakehouse platform. This transition enabled Epsilon to further scale its Decision Sciences solution and enable new cloud-based AI research capabilities on time and within budget, without being bottlenecked by the resource constraints of on-prem systems. Learn how Delta Lake, Unity Catalog, MLflow and LLM endpoints powered massive data volume, reduced data duplication, improved lineage visibility, accelerated Data Science and AI, and enabled new data to be immediately available for consumption by the entire Epsilon platform in a privacy-safe way. Using the Databricks platform as the base for AI and Data Science at global internet scale, Epsilon deploys marketing solutions across multiple cloud providers and multiple regions for many customers.

From Days to Seconds — Reducing Query Times on Large Geospatial Datasets by 99%

2025-06-10 Watch

talk

Chris Crawford (Databricks) , Hobson Bryan (Global Water Security Center)

CI/CD Data Lakehouse Databricks Git Cyber Security Spark

The Global Water Security Center translates environmental science into actionable insights for the U.S. Department of Defense. Prior to incorporating Databricks, responding to these requests required querying approximately five hundred thousand raster files representing over five hundred billion points. By leveraging lakehouse architecture, Databricks Auto Loader, Spark Streaming, Databricks Spatial SQL, H3 geospatial indexing and Databricks Liquid Clustering, we were able to drastically reduce our “time to analysis” from multiple business days to a matter of seconds. Now, our data scientists execute queries on pre-computed tables in Databricks, resulting in a “time to analysis” that is 99% faster, giving our teams more time for deeper analysis of the data. Additionally, we’ve incorporated Databricks Workflows, Databricks Asset Bundles, Git and Git Actions to support CI/CD across workspaces. We completed this work in close partnership with Databricks.

Geo-Powering Insights: The Art of Spatial Data Integration and Visualization

2025-06-10 Watch

talk

Mathieu Pelletier (Databricks)

BI Data Lakehouse Databricks Power BI SQL Tableau

In this presentation, we will explore how to leverage Databricks' SQL engine to efficiently ingest and transform geospatial data. We'll demonstrate the seamless process of connecting to external systems such as ArcGIS to retrieve datasets, showcasing the platform's versatility in handling diverse data sources. We'll then delve into the power of Databricks Apps, illustrating how you can create custom geospatial dashboards using various frameworks like Streamlit and Flask, or any framework of your choice. This flexibility allows you to tailor your visualizations to your specific needs and preferences. Furthermore, we'll highlight the Databricks Lakehouse's integration capabilities with popular dashboarding tools such as Tableau and Power BI. This integration enables you to combine the robust data processing power of Databricks with the advanced visualization features of these specialized tools.

Unity Catalog Upgrades Made Easy. Step-by-Step Guide for Databricks Labs UCX

2025-06-10 Watch

talk

Vuong ‎ (Databricks) , Liran Bareket (Databricks)

Dashboard Data Lakehouse Databricks

The Databricks labs project UCX aims to optimize the Unity Catalog (UC) upgrade process, ensuring a seamless transition for businesses. This session will delve into various aspects of the UCX project including the installation and configuration of UCX, the use of the UCX Assessment Dashboard to reduce upgrade risks and prepare effectively for a UC upgrade, and the automation of key components such as group, table and code migration. Attendees will gain comprehensive insights into leveraging UCX and Lakehouse Federation for a streamlined and efficient upgrade process. This session is aimed at customers new to UCX as well as veterans.

Gaining Insight From Image Data in Databricks Using Multi-Modal Foundation Model API

2025-06-10 Watch

lightning_talk

Ankit Mathur (Databricks)

API Data Lakehouse Databricks LLM

Unlock the hidden potential in your image data without specialized computer vision expertise! This session explores how to leverage Databricks' multi-modal Foundation Model APIs to analyze, classify and extract insights from visual content. Learn how Databricks provides a unified API to understand images using powerful foundation models within your data workflows. Key takeaways: Implementing efficient workflows for image data processing within your Databricks lakehouse Understanding multi-modal foundation models for image understanding Integrating image analysis with other data types for business insights Using OpenAI-compatible APIs to query multi-modal models Building end-to-end pipelines from image ingestion to model deployment Whether analyzing product images, processing visual documents or building content moderation systems, you'll discover how to extract valuable insights from your image data within the Databricks ecosystem.

Powering Personalization at Scale with Data: How T-Mobile and Deep Sync Help Brands Connect with Consumers

2025-06-10 Watch

lightning_talk

Jeff Frantz (T-Mobile) , Pieter De Temmerman (Deep Sync)

Data Lakehouse Databricks Marketing

Discover how T-Mobile and Deep Sync are redefining personalized marketing through the power of Databricks. Deep Sync, a leader in deterministic identity solutions, has brought its identity spine to Databricks Lakehouse, which covers over 97% of U.S. households with the most current and accurate attribute data available. T-Mobile is bringing to market for the first time a new data services business that introduces privacy-compliant, consent-based consumer data. Together, T-Mobile and Deep Sync are transforming how brands engage with consumers—enabling bespoke, hyper-personalized workflows, identity-driven insights, and closed-loop measurement through Databricks’ Multi-Party Cleanrooms. Join this session to learn how data and identity are converging to solve today’s modern marketing challenges so consumers can rediscover what it feels like to be seen, not targeted

Gen AI Deployment and Monitoring

2025-06-10

talk

AI/ML Data Lakehouse Databricks GenAI RAG Vector DB

This course introduces learners to deploying, operationalizing, and monitoring generative artificial intelligence (AI) applications. First, learners will develop knowledge and skills in deploying generative AI applications using tools like Model Serving. Next, the course will discuss operationalizing generative AI applications following modern LLMOps best practices and recommended architectures. Finally, learners will be introduced to the idea of monitoring generative AI applications and their components using Lakehouse Monitoring. Pre-requisites: Familiarity with prompt engineering and retrieval-augmented generation (RAG) techniques, including data preparation, embeddings, vectors, and vector databases. A foundational knowledge of Databricks Data Intelligence Platform tools for evaluation and governance (particularly Unity Catalog). Labs: Yes Certification Path: Databricks Certified Generative AI Engineer Associate

Machine Learning Operations

2025-06-10

talk

AI/ML Data Lakehouse Databricks Delta MLOps Python

This course will guide participants through a comprehensive exploration of machine learning model operations, focusing on MLOps and model lifecycle management. The initial segment covers essential MLOps components and best practices, providing participants with a strong foundation for effectively operationalizing machine learning models. In the latter part of the course, we will delve into the basics of the model lifecycle, demonstrating how to navigate it seamlessly using the Model Registry in conjunction with the Unity Catalog for efficient model management. By the course's conclusion, participants will have gained practical insights and a well-rounded understanding of MLOps principles, equipped with the skills needed to navigate the intricate landscape of machine learning model operations. Pre-requisites: Familiarity with Databricks workspace and notebooks, familiarity with Delta Lake and Lakehouse, intermediate level knowledge of Python (e.g. understanding of basic MLOps concepts and practices as well as infrastructure and importance of monitoring MLOps solutions) Labs: Yes Certification Path: Databricks Certified Machine Learning Associate

Site to Insight: Powering Construction Analytics Through Delta Sharing

2025-06-10 Watch

lightning_talk

Vinodh Thiagarajan (Procore) , vishnu sreenivasan (Procore)

Analytics BI Data Lakehouse Delta

At Procore, we're transforming the construction industry through innovative data solutions. This session unveils how we've supercharged our analytics offerings using a unified lakehouse architecture and Delta Sharing, delivering game-changing results for our customers and our business and how data professionals can unlock the full potential of their data assets and drive meaningful business outcomes. Key highlights: Learn how we've implemented seamless, secure sharing of large datasets across various BI tools and programming languages, dramatically accelerating time-to-insights for our customers Discover our approach to sharing dynamically filtered subsets of data across our numerous customers with cross-platform view sharing We'll demonstrate how our architecture has eliminated the need for data replication, fostering a more efficient, collaborative data ecosystem

Bridging Ontologies & Lakehouses: Palantir AIP + Databricks for Secure Autonomous AI

2025-06-10 Watch

talk

Siddhant Ekale (Palantir) , Ben Abood (Databricks)

AI/ML Data Lakehouse Databricks

AI is moving from pilots to production, but many organizations still struggle to connect boardroom ambitions with operational reality. Palantir’s Artificial Intelligence Platform (AIP) and the Databricks Data Intelligence Platform now form a single, open architecture that closes this gap by pairing Palantir’s operational decision empowering Ontology- with Databricks’ industry-leading scale, governance and Lakehouse economics. The result: real-time, AI-powered, autonomous workflows that are already powering mission-critical outcomes for the U.S. Department of Defense, bp and other joint customers across the public and private sectors. In this technically grounded but business-focused session you will see the new reference architecture in action. We will walk through how Unity Catalog and Palantir Virtual Tables provide governed, zero-copy access to Lakehouse data and back mission-critical operational workflows on top of Palantir’s semantic ontology and agentic AI capabilities. We will also explore how Palantir’s no-code and pro-code tooling integrates with Databricks compute to orchestrate builds and write tables to Unity Catalog. Come hear from customers currently using this architecture to drive critical business outcomes seamlessly across Databricks and Palantir.

From Datavault to Delta Lake: Streamlining Data Sync with Lakeflow Connect

2025-06-10 Watch

talk

Olivia Ren (Databricks) , Andrew Clarke (Australian Red Cross Lifeblood)

Analytics Azure Data Lakehouse Data Vault Databricks Delta

In this session, we will explore the Australian Red Cross Lifeblood's approach to synchronizing an Azure SQL Datavault 2.0 (DV2.0) implementation with Unity Catalog (UC) using Lakeflow Connect. Lifeblood's DV2.0 data warehouse, which includes raw vault (RV) and business vault (BV) tables, as well as information marts defined as views, required a multi-step process to achieve data/business logic sync with UC. This involved using Lakeflow Connect to ingest RV and BV data, followed by a custom process utilizing JDBC to ingest view definitions, and the automated/manual conversion of T-SQL to Databricks SQL views, with Lakehouse Monitoring for validation. In this talk, we will share our journey, the design decisions we made, and how the resulting solution now supports analytics workloads, analysts, and data scientists at Lifeblood.

Lakeflow Declarative Pipelines Integrations and Interoperability: Get Data From — and to — Anywhere

2025-06-10 Watch

talk

Ryan Nienhuis (Databricks)

API Azure Cosmos Data Lakehouse Delta ETL/ELT

This session is repeated.In this session, you will learn how to integrate Lakeflow Declarative Pipelines with external systems in order to ingest and send data virtually anywhere. Lakeflow Declarative Pipelines is most often used in ingestion and ETL into the Lakehouse. New Lakeflow Declarative Pipelines capabilities like the Lakeflow Declarative Pipelines Sinks API and added support for Python Data Source and ForEachBatch have opened up Lakeflow Declarative Pipelines to support almost any integration. This includes popular Apache Spark™ integrations like JDBC, Kafka, External and managed Delta tables, Azure CosmosDB, MongoDB and more.

talk-data.com

Top Topics

Top Speakers

Unified Governance and Enterprise Sharing for Data + AI

What’s new with Collaboration: Delta Sharing, Clean Room, Marketplace and the Ecosystem

Enterprise Financial Crime Detection: A Lakehouse Framework for FATF, Basel III, and BSA Compliance

Sponsored by: Onehouse | Open By Default, Fast By Design: One Lakehouse That Scales From BI to AI

Scaling Data Quality at Zillow: Migrating and Enhancing Data Quality Systems on Databricks

Sponsored by: Actian | Beyond the Lakehouse: Unlocking Enterprise-Wide AI-Ready Data with Unified Metadata Intelligence

Sponsored by: Slalom | Nasdaq's Journey from Fragmented Customer Data to AI-Ready Insights

Unlocking the Power of Iceberg: Our Journey to a Unified Lakehouse on Databricks

Databricks on Databricks: Powering Marketing Insights with Lakehouse

From Largest to Best: How We Transformed Databricks’ Biggest Workspace With Unity Catalog

Sponsored by: Coalesce | From Raw Data to Real-Time Retention: Powering Customer Health Scores on Databricks

Kernel, Catalog, Action! Reimagining our Delta-Spark Connector with DSv2

AI-Driven Drug Discovery: Accelerating Molecular Insights With NVIDIA and Databricks

AI Powering Epsilon's Identity Strategy: Unified Marketing Platform on Databricks

From Days to Seconds — Reducing Query Times on Large Geospatial Datasets by 99%

Geo-Powering Insights: The Art of Spatial Data Integration and Visualization

Unity Catalog Upgrades Made Easy. Step-by-Step Guide for Databricks Labs UCX

Gaining Insight From Image Data in Databricks Using Multi-Modal Foundation Model API

Powering Personalization at Scale with Data: How T-Mobile and Deep Sync Help Brands Connect with Consumers

Gen AI Deployment and Monitoring

Machine Learning Operations

Site to Insight: Powering Construction Analytics Through Delta Sharing

Bridging Ontologies & Lakehouses: Palantir AIP + Databricks for Secure Autonomous AI

From Datavault to Delta Lake: Streamlining Data Sync with Lakeflow Connect

Lakeflow Declarative Pipelines Integrations and Interoperability: Get Data From — and to — Anywhere