talk-data.com talk-data.com

Event

Data + AI Summit 2025

2025-06-09 – 2025-06-13 Databricks Summit Visit website ↗

Activities tracked

105

Filtering by: Data Lakehouse ×

Sessions & talks

Showing 51–75 of 105 · Newest first

Search within this event →
Unified Governance and Enterprise Sharing for Data + AI

Unified Governance and Enterprise Sharing for Data + AI

2025-06-11 Watch
talk
Molly Just-Behr (Databricks) , Suresh Kaudi (World Bank) , Luke Bilbro (Databricks) , Marcelo Diotto (Petrobras)

The Databricks Lakehouse for Public Sector is the only enterprise data platform that allows you to leverage all your data, from any source, on any workload to always offer better citizen services/warfighter support/student success with the best outcomes, at the lowest cost, with the greatest investment protection.

What’s new with Collaboration: Delta Sharing, Clean Room, Marketplace and the Ecosystem

What’s new with Collaboration: Delta Sharing, Clean Room, Marketplace and the Ecosystem

2025-06-11 Watch
talk
Tao Tao (Databricks) , Harish Gaur (Databricks)

Databricks continues to redefine how organizations securely and openly collaborate on data. With new innovations like Clean Rooms for multi-party collaboration, Sharing for Lakehouse Federation, cross-platform view sharing and Databricks Apps in the Marketplace, teams can now share and access data more easily, cost-effectively and across platforms — whether or not they’re using Databricks. In this session, we’ll deliver live demos of key capabilities that power this transformation: Delta Sharing: The industry’s only open protocol for seamless cross-platform data sharing Databricks Marketplace: A central hub for discovering and monetizing data and AI assets Clean Rooms: A privacy-preserving solution for secure, multi-party data collaboration Join us to see how these tools enable trusted data sharing, accelerate insights and drive innovation across your ecosystem. Bring your questions and walk away with practical ways to put these capabilities into action today.

Enterprise Financial Crime Detection: A Lakehouse Framework for FATF, Basel III, and BSA Compliance

Enterprise Financial Crime Detection: A Lakehouse Framework for FATF, Basel III, and BSA Compliance

2025-06-11 Watch
talk
Deepak Khetpal (Tiger Analytics) , Surya Sai Turaga (Databricks)

We will present a framework for FinCrime detection leveraging Databricks lakehouse architecture specifically how institutions can achieve both data flexibility & ACID transaction guarantees essential for FinCrime monitoring. The framework incorporates advanced ML models for anomaly detection, pattern recognition, and predictive analytics, while maintaining clear data lineage & audit trails required by regulatory bodies. We will also discuss some specific improvements in reduction of false positives, improvement in detection speed, and faster regulatory reporting, delve deep into how the architecture addresses specific FATF recommendations, Basel III risk management requirements, and BSA compliance obligations, particularly in transaction monitoring and SAR. The ability to handle structured and unstructured data while maintaining data quality and governance makes it particularly valuable for large financial institutions dealing with complex, multi-jurisdictional compliance requirements.

Sponsored by: Onehouse | Open By Default, Fast By Design: One Lakehouse That Scales From BI to AI

Sponsored by: Onehouse | Open By Default, Fast By Design: One Lakehouse That Scales From BI to AI

2025-06-11 Watch
lightning_talk
Kyle Weller (Onehouse.ai)

You already see the value of the lakehouse. But are you truly maximizing its potential across all workloads, from BI to AI? In this session, Onehouse unveils how our open lakehouse architecture unifies your entire stack, enabling true interoperability across formats, catalogs, and engines. From lightning-fast ingestion at scale to cost-efficient processing and multi-catalog sync, Onehouse helps you go beyond trade-offs. Discover how Apache XTable (Incubating) enables cross-table-format compatibility, how OpenEngines puts your data in front of the best engine for the job, and how OneSync keeps data consistent across Snowflake, Athena, Redshift, BigQuery, and more. Meanwhile, our purpose-built lakehouse runtime slashes ingest and ETL costs. Whether you’re delivering BI, scaling AI, or building the next big thing, you need a lakehouse that’s open and powerful. Onehouse opens everything—so your data can power anything.

Scaling Data Quality at Zillow: Migrating and Enhancing Data Quality Systems on Databricks

2025-06-10
lightning_talk
Laura Zhou (Zillow) , Firas Farah (Databricks)

Zillow has well-established, comprehensive systems for defining and enforcing data quality contracts and detecting anomalies.In this session, we will share how we evaluated Databricks’ native data quality features and why we chose Lakeflow Declarative Pipelines expectations for Lakeflow Declarative Pipelines, along with a combination of enforced constraints and self-defined queries for other job types. Our evaluation considered factors such as performance overhead, cost and scalability. We’ll highlight key improvements over our previous system and demonstrate how these choices have enabled Zillow to enforce scalable, production-grade data quality.Additionally, we are actively testing Databricks’ latest data quality innovations, including enhancements to lakehouse monitoring and the newly released DQX project from Databricks Labs.In summary, we will cover Zillow’s approach to data quality in the lakehouse, key lessons from our migration and actionable takeaways.

Sponsored by: Actian | Beyond the Lakehouse: Unlocking Enterprise-Wide AI-Ready Data with Unified Metadata Intelligence

Sponsored by: Actian | Beyond the Lakehouse: Unlocking Enterprise-Wide AI-Ready Data with Unified Metadata Intelligence

2025-06-10 Watch
lightning_talk
Emma McGrattan (Actian)

As organizations scale AI initiatives on platforms like Databricks, one challenge remains: bridging the gap between the data in the lakehouse and the vast, distributed data that lives elsewhere. Turning massive volumes of technical metadata into trusted, business-ready insight requires more than cataloging what's inside the lakehouse—it demands true enterprise-wide intelligence. Actian CTO Emma McGrattan will explore how combining Databricks Unity Catalog with the Actian Data Platform extends visibility, governance, and trust beyond the lakehouse. Learn how leading enterprises are: Integrating metadata across all enterprise data assets for complete visibility Enriching Unity Catalog metadata with business context for broader usability Empowering non-technical users to discover, trust, and act on AI-ready data Building a foundation for scalable data productization with governance by design

Sponsored by: Slalom | Nasdaq's Journey from Fragmented Customer Data to AI-Ready Insights

Sponsored by: Slalom | Nasdaq's Journey from Fragmented Customer Data to AI-Ready Insights

2025-06-10 Watch
lightning_talk
Soumya Ghosh (Slalom)

Nasdaq’s rapid growth through acquisitions led to fragmented client data across multiple Salesforce instances, limiting cross-sell potential and sales insights. To solve this, Nasdaq partnered with Slalom to build a unified Client Data Hub on the Databricks Lakehouse Platform. This cloud-based solution merges CRM, product usage, and financial data into a consistent, 360° client view accessible across all Salesforce orgs with bi-directional integration. It enables personalized engagement, targeted campaigns, and stronger cross-sell opportunities across all business units. By delivering this 360 view directly in Salesforce, Nasdaq is improving sales visibility, client satisfaction, and revenue growth. The platform also enables advanced analytics like segmentation, churn prediction, and revenue optimization. With centralized data in Databricks, Nasdaq is now positioned to deploy next-gen Agentic AI and chatbots to drive efficiency and enhance sales and marketing experiences.

Unlocking the Power of Iceberg: Our Journey to a Unified Lakehouse on Databricks

Unlocking the Power of Iceberg: Our Journey to a Unified Lakehouse on Databricks

2025-06-10 Watch
lightning_talk
Tomer Sabag (LSports)

This session showcases our journey of adopting Apache Iceberg™ to build a modern lakehouse architecture and leveraging Databricks advanced Iceberg support to take it to the next level. We’ll dive into the key design principles behind our lakehouse, the operational challenges we tackled and how Databricks enabled us to unlock enhanced performance, scalability and streamlined data workflows. Whether you’re exploring Apache Iceberg™ or building a lakehouse on Databricks, this session offers actionable insights, lessons learned and best practices for modern data engineering.

Databricks on Databricks: Powering Marketing Insights with Lakehouse

Databricks on Databricks: Powering Marketing Insights with Lakehouse

2025-06-10 Watch
talk
Elizabeth Dobbs (Databricks) , Anoop Muraleedharan (Databricks)

This presentation outlines the evolution of our marketing data strategy, focusing on how we’ve built a strong foundation using the Databricks Lakehouse. We will explore key advancements across data ingestion, strategy, and insights, highlighting the transition from legacy systems to a more scalable and intelligent infrastructure. Through real-world applications, we will showcase how unified Customer 360 insights drive personalization, predictive analytics enhance campaign effectiveness, and GenAI optimizes content creation and marketing execution. Looking ahead, we will demonstrate the next phase of our CDP, the shift toward an end-user-first analytics model powered by AIBI, Genie and Matik, and the growing importance of clean rooms for secure data collaboration. This is just the beginning, and we are poised to unlock even greater capabilities in the future.

From Largest to Best: How We Transformed Databricks’ Biggest Workspace With Unity Catalog

From Largest to Best: How We Transformed Databricks’ Biggest Workspace With Unity Catalog

2025-06-10 Watch
talk
Donghan Zhang (Databricks) , Li Yang (Databricks)

Join us as we unveil how we transformed the largest Databricks workspace into the best-in-class lakehouse through Unity Catalog. Discover how we harnessed lineage and unified access management to build ultimate governance automation.

Sponsored by: Coalesce | From Raw Data to Real-Time Retention: Powering Customer Health Scores on Databricks

2025-06-10
talk
Josh Hall (Coalesce)

Understanding customer engagement and retention isn’t optional—it’s mission-critical. Join us for a live demo to see how you can build a scalable, governed customer health scoring model by transforming raw signals into actionable insights. Discover how Coalesce’s low-code development platform works seamlessly with Databricks’ lakehouse architecture to unify and operationalize customer data at scale. With built-in governance, automation, and metadata intelligence, you’ll deliver trusted scores that support proactive decision-making across the business. Why Attend? Accelerate time-to-insight with automated, low-code transformations Build repeatable, enterprise-grade scoring models with full data lineage Ensure governance, transparency, and compliance at every step

Kernel, Catalog, Action! Reimagining our Delta-Spark Connector with DSv2

Kernel, Catalog, Action! Reimagining our Delta-Spark Connector with DSv2

2025-06-10 Watch
lightning_talk
Scott Sandre (Databricks)

Delta Lake is redesigning its Spark connector through the combination of three key technologies: First, we're updating our Spark APIs to DSv2 to achieve deeper catalog integration and improved integration with the Spark optimizer. Second, we're fully integrating on top of Delta Kernel to take advantage of its simplified abstraction of Delta protocol complexities, accelerating feature adoption and improving maintainability. Third, we are transforming Delta to become a catalog-aware lakehouse format with Catalog Commits, enabling more efficient metadata management, governance and query performance. Join us to explore how we're advancing Delta Lake's architecture, pushing the boundaries of metadata management and creating a more intelligent, performant data lakehouse platform.

AI-Driven Drug Discovery: Accelerating Molecular Insights With NVIDIA and Databricks

AI-Driven Drug Discovery: Accelerating Molecular Insights With NVIDIA and Databricks

2025-06-10 Watch
talk
Karuna Nadadur (NVIDIA) , Srijit Chandrashekhar Nair (Databricks)

This session is repeated. In the race to revolutionize healthcare and drug discovery, biopharma companies are turning to AI to streamline workflows and unlock new scientific insights. This session, we will explore how NVIDIA BioNeMo, combined with Databricks Delta Lakehouse, can be used for advancing drug discovery for critical applications like molecular structure modeling, protein folding and diagnostics. We’ll demonstrate how BioNeMo pre-trained models can run inference on data securely stored in Delta Lake, delivering actionable insights. By leveraging containerized solutions on Databricks’ ML Runtime with GPU acceleration, users can achieve significant performance gains compared to traditional CPU-based computation.

AI Powering Epsilon's Identity Strategy: Unified Marketing Platform on Databricks

AI Powering Epsilon's Identity Strategy: Unified Marketing Platform on Databricks

2025-06-10 Watch
talk
Gairik Chakraborty (Epsilon Data Management) , Boaz Super (Epsilon Data Management)

Join us to hear about how Epsilon Data Management migrated Epsilon’s unique, AI-powered marketing identity solution from multi-petabyte on-prem Hadoop and data warehouse systems to a unified Databricks Lakehouse platform. This transition enabled Epsilon to further scale its Decision Sciences solution and enable new cloud-based AI research capabilities on time and within budget, without being bottlenecked by the resource constraints of on-prem systems. Learn how Delta Lake, Unity Catalog, MLflow and LLM endpoints powered massive data volume, reduced data duplication, improved lineage visibility, accelerated Data Science and AI, and enabled new data to be immediately available for consumption by the entire Epsilon platform in a privacy-safe way. Using the Databricks platform as the base for AI and Data Science at global internet scale, Epsilon deploys marketing solutions across multiple cloud providers and multiple regions for many customers.

From Days to Seconds — Reducing Query Times on Large Geospatial Datasets by 99%

From Days to Seconds — Reducing Query Times on Large Geospatial Datasets by 99%

2025-06-10 Watch
talk
Chris Crawford (Databricks) , Hobson Bryan (Global Water Security Center)

The Global Water Security Center translates environmental science into actionable insights for the U.S. Department of Defense. Prior to incorporating Databricks, responding to these requests required querying approximately five hundred thousand raster files representing over five hundred billion points. By leveraging lakehouse architecture, Databricks Auto Loader, Spark Streaming, Databricks Spatial SQL, H3 geospatial indexing and Databricks Liquid Clustering, we were able to drastically reduce our “time to analysis” from multiple business days to a matter of seconds. Now, our data scientists execute queries on pre-computed tables in Databricks, resulting in a “time to analysis” that is 99% faster, giving our teams more time for deeper analysis of the data. Additionally, we’ve incorporated Databricks Workflows, Databricks Asset Bundles, Git and Git Actions to support CI/CD across workspaces. We completed this work in close partnership with Databricks.

Geo-Powering Insights: The Art of Spatial Data Integration and Visualization

Geo-Powering Insights: The Art of Spatial Data Integration and Visualization

2025-06-10 Watch
talk
Mathieu Pelletier (Databricks)

In this presentation, we will explore how to leverage Databricks' SQL engine to efficiently ingest and transform geospatial data. We'll demonstrate the seamless process of connecting to external systems such as ArcGIS to retrieve datasets, showcasing the platform's versatility in handling diverse data sources. We'll then delve into the power of Databricks Apps, illustrating how you can create custom geospatial dashboards using various frameworks like Streamlit and Flask, or any framework of your choice. This flexibility allows you to tailor your visualizations to your specific needs and preferences. Furthermore, we'll highlight the Databricks Lakehouse's integration capabilities with popular dashboarding tools such as Tableau and Power BI. This integration enables you to combine the robust data processing power of Databricks with the advanced visualization features of these specialized tools.

Unity Catalog Upgrades Made Easy. Step-by-Step Guide for Databricks Labs UCX

Unity Catalog Upgrades Made Easy. Step-by-Step Guide for Databricks Labs UCX

2025-06-10 Watch
talk
Vuong ‎ (Databricks) , Liran Bareket (Databricks)

The Databricks labs project UCX aims to optimize the Unity Catalog (UC) upgrade process, ensuring a seamless transition for businesses. This session will delve into various aspects of the UCX project including the installation and configuration of UCX, the use of the UCX Assessment Dashboard to reduce upgrade risks and prepare effectively for a UC upgrade, and the automation of key components such as group, table and code migration. Attendees will gain comprehensive insights into leveraging UCX and Lakehouse Federation for a streamlined and efficient upgrade process. This session is aimed at customers new to UCX as well as veterans.

Gaining Insight From Image Data in Databricks Using Multi-Modal Foundation Model API

Gaining Insight From Image Data in Databricks Using Multi-Modal Foundation Model API

2025-06-10 Watch
lightning_talk
Ankit Mathur (Databricks)

Unlock the hidden potential in your image data without specialized computer vision expertise! This session explores how to leverage Databricks' multi-modal Foundation Model APIs to analyze, classify and extract insights from visual content. Learn how Databricks provides a unified API to understand images using powerful foundation models within your data workflows. Key takeaways: Implementing efficient workflows for image data processing within your Databricks lakehouse Understanding multi-modal foundation models for image understanding Integrating image analysis with other data types for business insights Using OpenAI-compatible APIs to query multi-modal models Building end-to-end pipelines from image ingestion to model deployment Whether analyzing product images, processing visual documents or building content moderation systems, you'll discover how to extract valuable insights from your image data within the Databricks ecosystem.

Powering Personalization at Scale with Data: How T-Mobile and Deep Sync Help Brands Connect with Consumers

Powering Personalization at Scale with Data: How T-Mobile and Deep Sync Help Brands Connect with Consumers

2025-06-10 Watch
lightning_talk
Jeff Frantz (T-Mobile) , Pieter De Temmerman (Deep Sync)

Discover how T-Mobile and Deep Sync are redefining personalized marketing through the power of Databricks. Deep Sync, a leader in deterministic identity solutions, has brought its identity spine to Databricks Lakehouse, which covers over 97% of U.S. households with the most current and accurate attribute data available. T-Mobile is bringing to market for the first time a new data services business that introduces privacy-compliant, consent-based consumer data. Together, T-Mobile and Deep Sync are transforming how brands engage with consumers—enabling bespoke, hyper-personalized workflows, identity-driven insights, and closed-loop measurement through Databricks’ Multi-Party Cleanrooms. Join this session to learn how data and identity are converging to solve today’s modern marketing challenges so consumers can rediscover what it feels like to be seen, not targeted

Gen AI Deployment and Monitoring

2025-06-10
talk

This course introduces learners to deploying, operationalizing, and monitoring generative artificial intelligence (AI) applications. First, learners will develop knowledge and skills in deploying generative AI applications using tools like Model Serving. Next, the course will discuss operationalizing generative AI applications following modern LLMOps best practices and recommended architectures. Finally, learners will be introduced to the idea of monitoring generative AI applications and their components using Lakehouse Monitoring. Pre-requisites: Familiarity with prompt engineering and retrieval-augmented generation (RAG) techniques, including data preparation, embeddings, vectors, and vector databases. A foundational knowledge of Databricks Data Intelligence Platform tools for evaluation and governance (particularly Unity Catalog). Labs: Yes Certification Path: Databricks Certified Generative AI Engineer Associate

Machine Learning Operations

2025-06-10
talk

This course will guide participants through a comprehensive exploration of machine learning model operations, focusing on MLOps and model lifecycle management. The initial segment covers essential MLOps components and best practices, providing participants with a strong foundation for effectively operationalizing machine learning models. In the latter part of the course, we will delve into the basics of the model lifecycle, demonstrating how to navigate it seamlessly using the Model Registry in conjunction with the Unity Catalog for efficient model management. By the course's conclusion, participants will have gained practical insights and a well-rounded understanding of MLOps principles, equipped with the skills needed to navigate the intricate landscape of machine learning model operations. Pre-requisites: Familiarity with Databricks workspace and notebooks, familiarity with Delta Lake and Lakehouse, intermediate level knowledge of Python (e.g. understanding of basic MLOps concepts and practices as well as infrastructure and importance of monitoring MLOps solutions) Labs: Yes Certification Path: Databricks Certified Machine Learning Associate

Site to Insight: Powering Construction Analytics Through Delta Sharing

Site to Insight: Powering Construction Analytics Through Delta Sharing

2025-06-10 Watch
lightning_talk
Vinodh Thiagarajan (Procore) , vishnu sreenivasan (Procore)

At Procore, we're transforming the construction industry through innovative data solutions. This session unveils how we've supercharged our analytics offerings using a unified lakehouse architecture and Delta Sharing, delivering game-changing results for our customers and our business and how data professionals can unlock the full potential of their data assets and drive meaningful business outcomes. Key highlights: Learn how we've implemented seamless, secure sharing of large datasets across various BI tools and programming languages, dramatically accelerating time-to-insights for our customers Discover our approach to sharing dynamically filtered subsets of data across our numerous customers with cross-platform view sharing We'll demonstrate how our architecture has eliminated the need for data replication, fostering a more efficient, collaborative data ecosystem

Bridging Ontologies & Lakehouses: Palantir AIP + Databricks for Secure Autonomous AI

Bridging Ontologies & Lakehouses: Palantir AIP + Databricks for Secure Autonomous AI

2025-06-10 Watch
talk
Siddhant Ekale (Palantir) , Ben Abood (Databricks)

AI is moving from pilots to production, but many organizations still struggle to connect boardroom ambitions with operational reality. Palantir’s Artificial Intelligence Platform (AIP) and the Databricks Data Intelligence Platform now form a single, open architecture that closes this gap by pairing Palantir’s operational decision empowering Ontology- with Databricks’ industry-leading scale, governance and Lakehouse economics. The result: real-time, AI-powered, autonomous workflows that are already powering mission-critical outcomes for the U.S. Department of Defense, bp and other joint customers across the public and private sectors. In this technically grounded but business-focused session you will see the new reference architecture in action. We will walk through how Unity Catalog and Palantir Virtual Tables provide governed, zero-copy access to Lakehouse data and back mission-critical operational workflows on top of Palantir’s semantic ontology and agentic AI capabilities. We will also explore how Palantir’s no-code and pro-code tooling integrates with Databricks compute to orchestrate builds and write tables to Unity Catalog. Come hear from customers currently using this architecture to drive critical business outcomes seamlessly across Databricks and Palantir.

From Datavault to Delta Lake: Streamlining Data Sync with Lakeflow Connect

From Datavault to Delta Lake: Streamlining Data Sync with Lakeflow Connect

2025-06-10 Watch
talk
Olivia Ren (Databricks) , Andrew Clarke (Australian Red Cross Lifeblood)

In this session, we will explore the Australian Red Cross Lifeblood's approach to synchronizing an Azure SQL Datavault 2.0 (DV2.0) implementation with Unity Catalog (UC) using Lakeflow Connect. Lifeblood's DV2.0 data warehouse, which includes raw vault (RV) and business vault (BV) tables, as well as information marts defined as views, required a multi-step process to achieve data/business logic sync with UC. This involved using Lakeflow Connect to ingest RV and BV data, followed by a custom process utilizing JDBC to ingest view definitions, and the automated/manual conversion of T-SQL to Databricks SQL views, with Lakehouse Monitoring for validation. In this talk, we will share our journey, the design decisions we made, and how the resulting solution now supports analytics workloads, analysts, and data scientists at Lifeblood.

Lakeflow Declarative Pipelines Integrations and Interoperability: Get Data From — and to — Anywhere

Lakeflow Declarative Pipelines Integrations and Interoperability: Get Data From — and to — Anywhere

2025-06-10 Watch
talk
Ryan Nienhuis (Databricks)

This session is repeated.In this session, you will learn how to integrate Lakeflow Declarative Pipelines with external systems in order to ingest and send data virtually anywhere. Lakeflow Declarative Pipelines is most often used in ingestion and ETL into the Lakehouse. New Lakeflow Declarative Pipelines capabilities like the Lakeflow Declarative Pipelines Sinks API and added support for Python Data Source and ForEachBatch have opened up Lakeflow Declarative Pipelines to support almost any integration. This includes popular Apache Spark™ integrations like JDBC, Kafka, External and managed Delta tables, Azure CosmosDB, MongoDB and more.