Data + AI Summit 2025

Site to Insight: Powering Construction Analytics Through Delta Sharing

2025-06-10 Watch

lightning_talk

Vinodh Thiagarajan (Procore) , vishnu sreenivasan (Procore)

Analytics BI Data Lakehouse Delta

At Procore, we're transforming the construction industry through innovative data solutions. This session unveils how we've supercharged our analytics offerings using a unified lakehouse architecture and Delta Sharing, delivering game-changing results for our customers and our business and how data professionals can unlock the full potential of their data assets and drive meaningful business outcomes. Key highlights: Learn how we've implemented seamless, secure sharing of large datasets across various BI tools and programming languages, dramatically accelerating time-to-insights for our customers Discover our approach to sharing dynamically filtered subsets of data across our numerous customers with cross-platform view sharing We'll demonstrate how our architecture has eliminated the need for data replication, fostering a more efficient, collaborative data ecosystem

Enabling Sleep Science Research With Databricks and Delta Sharing

2025-06-10 Watch

talk

Alexandr Rivlin (Sleep Number Labs) , Sajeev Mayandi (Sleep Number)

AWS Azure Cloud Computing Databricks Delta GCP

Leveraging Databricks as a platform, we facilitate the sharing of anonymized datasets across various Databricks workspaces and accounts, spanning multiple cloud environments such as AWS, Azure, and Google Cloud. This capability, powered by Delta Sharing, extends both within and outside Sleep Number, enabling accelerated insights while ensuring compliance with data security and privacy standards. In this session, we will showcase our architecture and implementation strategy for data sharing, highlighting the use of Databricks’ Unity Catalog and Delta Sharing, along with integration with platforms like Jira, Jenkins, and Terraform to streamline project management and system orchestration.

From Datavault to Delta Lake: Streamlining Data Sync with Lakeflow Connect

2025-06-10 Watch

talk

Olivia Ren (Databricks) , Andrew Clarke (Australian Red Cross Lifeblood)

Analytics Azure Data Lakehouse Data Vault Databricks Delta

In this session, we will explore the Australian Red Cross Lifeblood's approach to synchronizing an Azure SQL Datavault 2.0 (DV2.0) implementation with Unity Catalog (UC) using Lakeflow Connect. Lifeblood's DV2.0 data warehouse, which includes raw vault (RV) and business vault (BV) tables, as well as information marts defined as views, required a multi-step process to achieve data/business logic sync with UC. This involved using Lakeflow Connect to ingest RV and BV data, followed by a custom process utilizing JDBC to ingest view definitions, and the automated/manual conversion of T-SQL to Databricks SQL views, with Lakehouse Monitoring for validation. In this talk, we will share our journey, the design decisions we made, and how the resulting solution now supports analytics workloads, analysts, and data scientists at Lifeblood.

Lakeflow Declarative Pipelines Integrations and Interoperability: Get Data From — and to — Anywhere

2025-06-10 Watch

talk

Ryan Nienhuis (Databricks)

API Azure Cosmos Data Lakehouse Delta ETL/ELT

This session is repeated.In this session, you will learn how to integrate Lakeflow Declarative Pipelines with external systems in order to ingest and send data virtually anywhere. Lakeflow Declarative Pipelines is most often used in ingestion and ETL into the Lakehouse. New Lakeflow Declarative Pipelines capabilities like the Lakeflow Declarative Pipelines Sinks API and added support for Python Data Source and ForEachBatch have opened up Lakeflow Declarative Pipelines to support almost any integration. This includes popular Apache Spark™ integrations like JDBC, Kafka, External and managed Delta tables, Azure CosmosDB, MongoDB and more.

Toyota: Maximizing Business Value and Ensuring Data Privacy with Databricks in Connected Vehicles

2025-06-10

talk

Yoshihiro Oe (TOYOTA MOTOR CORPORATION) , Satoshi Kuramitsu (Databricks)

Amazon EMR Databricks Delta

As global data privacy regulations tighten, balancing user data protection with maximizing its business value is crucial.This presentation explores how integrating Databricks into our connected-vehicle data platform enhances both governance and business outcomes. We’ll highlight a case where migrating from EMR to Databricks improved deletion performance and cut costs by 99% with Delta Lake. This shift not only ensures compliance with data-privacy regulations but also maximizes the potential of connected-vehicle data. We are developing a platform that balances compliance with business value and sets a global standard for data usage, inviting partners to join us in building a secure, efficient mobility ecosystem.

Accelerating Model Development and Fine-Tuning on Databricks with TwelveLabs

2025-06-10 Watch

talk

Wenwen Gao (NVIDIA) , Aiden Lee (Twelve Labs, Inc)

AI/ML Data Lakehouse Data Management Databricks Delta LLM

Scaling large language models (LLMs) and multimodal architectures requires efficient data management and computational power. NVIDIA NeMo Framework Megatron-LM on Databricks is an open source solution that integrates GPU acceleration and advanced parallelism with Databricks Delta Lakehouse, streamlining workflows for pre-training and fine-tuning models at scale. This session highlights context parallelism, a unique NeMo capability for parallelizing over sequence lengths, making it ideal for video datasets with large embeddings. Through the case study of TwelveLabs’ Pegasus-1 model, learn how NeMo empowers scalable multimodal AI development, from text to video processing, setting a new standard for LLM workflows.

Let's Save Tons of Money With Cloud-Native Data Ingestion!

2025-06-10 Watch

talk

Tyler Croy (Scribd, Inc.)

Airbyte AWS Aurora Kinesis Azure Cloud Computing

Delta Lake is a fantastic technology for quickly querying massive data sets, but first you need those massive data sets! In this session we will dive into the cloud-native architecture Scribd has adopted to ingest data from AWS Aurora, SQS, Kinesis Data Firehose and more. By using off-the-shelf open source tools like kafka-delta-ingest, oxbow and Airbyte, Scribd has redefined its ingestion architecture to be more event-driven, reliable, and most importantly: cheaper. No jobs needed! Attendees will learn how to use third-party tools in concert with a Databricks and Unity Catalog environment to provide a highly efficient and available data platform. This architecture will be presented in the context of AWS but can be adapted for Azure, Google Cloud Platform or even on-premise environments.

Spark 4.0 and Delta 4.0 For Streaming Data

2025-06-10 Watch

talk

Bryce Bartmann (Shell)

AI/ML Delta JSON Python Spark Data Streaming

Real-time data is one of the most important datasets for any Data and AI Platform across any industry. Spark 4.0 and Delta 4.0 include new features that make ingestion and querying of real-time data better than ever before. Features such as: Python custom data sources for simple ingestion of streaming and batch time series data sources using Spark Variant types for managing variable data types and json payloads that are common in the real time domain Delta liquid clustering for simple data clustering without the overhead or complexity of partitioning In this presentation you will learn how data teams can leverage these latest features to build industry-leading, real-time data products using Spark and Delta and includes real world examples and metrics of the improvements they make in performance and processing of data in the real time space.

Unlocking the Future of Dairy Farming: Leveraging Data Marketplaces at Lely

2025-06-10 Watch

talk

Simon Krejci (Lely) , Bulut Ficici (Lely)

Analytics Databricks Delta

Lely, a Dutch company specializing in dairy farming robotics, helps farmers with advanced solutions for milking, feeding and cleaning. This session explores Lely’s implementation of an Internal Data Marketplace, built around Databricks' Private Exchange Marketplace. The marketplace serves as a central hub for data teams and business users, offering seamless access to data, analytics and dashboards. Powered by Delta Sharing, it enables secure, private listing of data products across business domains, including notebooks, views, models and functions. This session covers the pros and cons of this approach, best practices for setting up a data marketplace and its impact on Lely’s operations. Real-world examples and insights will showcase the potential of integrating data-driven solutions into dairy farming. Join us to discover how data innovation drives the future of dairy farming through Lely’s experience.

AI Agents in Action: Structuring Unstructured Data on Demand With Databricks and Unstructured

2025-06-10 Watch

lightning_talk

Christopher Maddock (Unstructured)

AI/ML Databricks Delta LLM

LLM agents aren’t just answering questions — they’re running entire workflows. In this talk, we’ll show how agents can autonomously ingest, process and structure unstructured data using Unstructured, with outputs flowing directly into Databricks. Powered by the Model Context Protocol (MCP), agents can interface with Unstructured’s full suite of capabilities — discovering documents across sources, building ephemeral workflows and exporting structured insights into Delta tables. We’ll walk through a demo where an agent responds to a natural language request, dynamically pulls relevant documents, transforms them into usable data and surfaces insights — fast. Join us for a sneak peek into the future of AI-native data workflows, where LLMs don’t just assist — they operate.

Breaking Silos: Cigna’s Journey to Seamless Data Sharing with Delta Sharing

2025-06-10 Watch

lightning_talk

Jay Ehlen (Evernorth Health Services) , Nick De Young (The Cigna Group)

Databricks Delta

As data ecosystems grow increasingly complex, the ability to share data securely, seamlessly, and in real time has become a strategic differentiator. In this session, Cigna will showcase how Delta Sharing on Databricks has enabled them to modernize data delivery, reduce operational overhead, and unlock new market opportunities. Learn how Cigna achieved significant savings by streamlining operations, compute, and platform overhead for just one use case. Explore how decentralizing data ownership—transitioning from hyper-centralized teams to empowered product owners—has simplified delivery and accelerated innovation. Most importantly, see how this modern open data-sharing framework has positioned Cigna to win contracts they previously couldn’t, by enabling real-time, cross-organizational data collaboration with external partners. Join us to hear how Cigna is using Delta Sharing not just as a technical enabler, but as a business catalyst.

From Spaghetti Bowl Pipeline to Lakeflow Declarative Pipelines Efficiency

2025-06-10

lightning_talk

Peter Jones (Intermountain Healthcare)

Alteryx Data Engineering Delta Spark

In today's data-driven world, the ability to efficiently manage and transform data is crucial for any organization. This presentation will explore the process of converting a complex and messy workflow into a clean and simple Lakeflow Declarative Pipelines at a large integrated health system, Intermountain Health.Alteryx is a powerful tool for data preparation and blending, but as workflows grow in complexity, they can become difficult to manage and maintain. Lakeflow Declarative Pipelines, on the other hand, offers a more democratized, streamlined and scalable approach to data engineering, leveraging the power of Apache Spark and Delta Lake.We will begin by examining a typical legacy workflow, identifying common pain points such as tangled logic, performance bottlenecks and maintenance challenges. Next, we will demonstrate how to translate this workflow into a Lakeflow Declarative Pipelines, highlighting key steps such as data transformation, validation and delivery.

Petabyte-Scale On-Chain Insights: Real-Time Intelligence for the Next-Gen Financial Backbone

2025-06-10 Watch

lightning_talk

Leo Liang (CipherOwl Inc)

AI/ML Analytics Arrow Blockchain Data Lakehouse Delta

We’ll explore how CipherOwl Inc. constructed a near real-time, multi-chain data lakehouse to power anti-money laundering (AML) monitoring at a petabyte scale. We will walk through the end-to-end architecture, which integrates cutting-edge open-source technologies and AI-driven analytics to handle massive on-chain data volumes seamlessly. Off-chain intelligence complements this to meet rigorous AML requirements. At the core of our solution is ChainStorage, an OSS started by Coinbase that provides robust blockchain data ingestion and block-level serving. We enhanced it with Apache Spark™ and Arrow™, coupled for high-throughput processing and efficient data serialization, backed by Delta Lake and Kafka. For the serving layer, we employ StarRocks to deliver lightning-fast SQL analytics over vast datasets. Finally, our system incorporates machine learning and AI agents for continuous data curation and near real-time insights, which are crucial for tackling on-chain AML challenges.

Sponsored by: SAP | SAP and Databricks Open a Bold New Era of Data and AI

Apache Iceberg with Unity Catalog at HelloFresh

2025-06-10 Watch

talk

Max Schultze (HelloFresh) , Adam Komisarek (HelloFresh)

Flink Data Lakehouse Databricks Delta Iceberg Snowflake

Table formats like Delta Lake and Iceberg have been game changers for pushing lakehouse architecture into modern Enterprises. The acquisition of Tabular added Iceberg to the Databricks ecosystem, an open format that was already well supported by processing engines across the industry. At HelloFresh we are building a lakehouse architecture that integrates many touchpoints and technologies all across the organization. As such we chose Iceberg as the table format to bridge the gaps in our decentralized managed tech landscape. We are leveraging Unity Catalog as the Iceberg REST catalog of choice for storing metadata and managing tables. In this talk we will outline our architectural setup between Databricks, Spark, Flink and Snowflake and will explain the native Unity Iceberg REST catalog, as well as catalog federation towards connected engines. We will highlight the impact on our business and discuss the advantages and lessons learned from our early adopter experience.

How Danone Enhanced Global Data Sharing with Delta Sharing

2025-06-10 Watch

talk

BASELTO Yohan (Danone) , Gergő Pásztor (Databricks)

Data Contracts Databricks Delta

Learn how Danone, a global leader in the food industry, improved its data-sharing processes using Delta Sharing, an open protocol developed by Databricks. This session will explore how Danone migrated from a traditional hub-and-spoke model to a more efficient and scalable data-sharing approach that works seamlessly across regions and platforms. We’ll discuss practical concepts such as in-region and cross-region data sharing, fine-grained access control, data discovery, and the implementation of data contracts. You’ll also hear about the strategies Danone uses to deliver governed data efficiently while maintaining compliance with global regulations. Additionally, we’ll discuss a cost comparison between direct data access and replication. Finally, we’ll share insights into the challenges faced by global organizations in managing data sharing at scale and how Danone addressed these issues. Attendees will gain practical knowledge on building a reliable and secure data-sharing framework for international collaboration.

Delta Lake and the Data Mesh

2025-06-10 Watch

talk

KyJah Keys (Nextdata)

Data Lakehouse Databricks Delta DuckDB Polars Spark

Delta Lake has proven to be an excellent storage format. Coupled with the Databricks platform, the storage format has shined as a component of a distributed system on the lakehouse. The pairing of Delta and Spark provides an excellent platform, but users often struggle to perform comparable work outside of the Spark ecosystem. Tools such as delta-rs, Polars and DuckDb have brought access to users outside of Spark, but they are only building blocks of a larger system. In this 40-minute talk we will demonstrate how users can use data products on the Nextdata OS data mesh to interact with the Databricks platform to drive Delta Lake workflows. Additionally, we will show how users can build autonomous data products that interact with their Delta tables both inside and outside of the lakehouse platform. Attendees will learn how to integrate the Nextdata OS data mesh with the Databricks platform as both an external and integral component.

From Metadata to Agents: Building the future of content understanding with Coactive AI + Databricks

2025-06-10

talk

Augusto Moreno (NBC Universal) , William Gaviria Rojas (Coactive AI)

AI/ML BI Databricks Delta

Media enterprises generate vast amounts of visual content, but unlocking its full potential requires multimodal AI at scale. Coactive AI and NBCUniversal’s Corporate Decision Sciences team are transforming how enterprises discover and understand visual content. We explore how Coactive AI and Databricks — from Delta Share to Genie — can revolutionize media content search, tagging and enrichment, enabling new levels of collaboration. Attendees will see how this AI-powered approach fuels AI workflows, enhances BI insights and drives new applications — from automating cut sheet generation to improving content compliance and recommendations. By structuring and sharing enriched media metadata, Coactive AI and NBCU are unlocking deeper intelligence and laying the groundwork for agentic AI systems that retrieve, interpret and act on visual content. This session will showcase real-world examples of these AI agents and how they can reshape future content discovery and media workflows.

Securing Data Collaboration: A Deep Dive Into Security, Frameworks, and Use Cases

2025-06-10 Watch

talk

El Ghali Benchekroun (Databricks) , Bilal Obeidat (Databricks) , Bhavin Kukadia (Databricks)

AI/ML Analytics Databricks Delta Cyber Security

This session will focus on the security aspects of Databricks Delta Sharing, Databricks Cleanrooms and Databricks Marketplace, providing an exploration of how these solutions enable secure and scalable data collaboration while prioritizing privacy. Highlights: Use cases — Understand how Delta Sharing facilitates governed, real-time data exchange across platforms and how Cleanrooms support multi-party analytics without exposing sensitive information Security internals — Dive into Delta Sharing's security frameworks Dynamic views — Learn about fine-grained security controls Privacy-first Cleanrooms — Explore how Cleanrooms enable secure analytics while maintaining strict data privacy standards Private exchanges — Explore the role of private exchanges using Databricks Marketplace in securely sharing custom datasets and AI models with specific partners or subsidiaries Network security & compliance — Review best practices for network configurations and compliance measures

Sponsored by: Amperity | Transforming Guest Experiences: GoTo Foods’ Data Journey with Amperity & Databricks

SQL-Based ETL: Options for SQL-Only Databricks Development

2025-06-10 Watch

talk

Dustin Vannoy (Databricks)

Analytics Databricks dbt Delta ETL/ELT Git

Using SQL for data transformation is a powerful way for an analytics team to create their own data pipelines. However, relying on SQL often comes with tradeoffs such as limited functionality, hard-to-maintain stored procedures or skipping best practices like version control and data tests. Databricks supports building high-performing SQL ETL workloads. Attend this session to hear how Databricks supports SQL for data transformation jobs as a core part of your Data Intelligence Platform. In this session we will cover 4 options to use Databricks with SQL syntax to create Delta tables: Lakeflow Declarative Pipelines: A declarative ETL option to simplify batch and streaming pipelines dbt: An open-source framework to apply engineering best practices to SQL based data transformations SQLMesh: an open-core product to easily build high-quality and high-performance data pipelines SQL notebooks jobs: a combination of Databricks Workflows and parameterized SQL notebooks

Transforming Financial Intelligence with FactSet Structured and Unstructured Data and Delta Sharing

2025-06-10 Watch

talk

Kristen Clark (FactSet) , Keon Shahab (Databricks)

AI/ML Databricks Delta GenAI

Join us to explore the dynamic partnership between FactSet and Databricks, transforming data accessibility and insights. Discover the launch of FactSet’s Structured DataFeeds via Delta Sharing on the Databricks Marketplace, enhancing access to crucial financial data insights. Learn about the advantages of streamlined data delivery and how this integration empowers data ecosystems. Beyond structured data, explore the innovative potential of vectorized data sharing of unstructured content such as news, transcripts, and filings. Gain insights into the importance of seamless vectorized data delivery to support GenAI applications and how FactSet is preparing to simplify client GenAI workflows with AI-ready data. Experience a demo that showcases the complete journey from data delivery to actionable GenAI application responses in a real-world Financial Services scenario. See firsthand how FactSet is simplifying client GenAI workflows with AI-ready data that drives faster, more informed financial decisions.

Unlocking AI Value: Build AI Agents on SAP Data in Databricks

2025-06-10 Watch

talk

Qi Su (Databricks)

AI/ML Databricks Delta ETL/ELT SAP

Discover how enterprises are turning SAP data into intelligent AI. By tapping into contextual SAP data through Delta Sharing on Databricks - no messy ETL needed - they’re accelerating AI innovation and business insights. Learn how they: - Build domain-specific AI that can reason on private SAP data- Deliver data intelligence to power insights for business leaders- Govern and secure their new unified data estate

Breaking Silos: Enabling Databricks-Snowflake Interoperability With Iceberg and Unity Catalog

2025-06-10 Watch

talk

Mohit Kumar (T-Mobile) , Geoffrey Freeman (T-Mobile)

API Databricks Delta Iceberg Cyber Security Snowflake

As data ecosystems grow more complex, organizations often struggle with siloed platforms and fragmented governance. In this session, we’ll explore how our team made Databricks the central hub for cross-platform interoperability, enabling seamless Snowflake integration through Unity Catalog and the Iceberg REST API. We’ll cover: Why interoperability matters and the business drivers behind our approach How Unity Catalog and Uniform simplify interoperability, allowing Databricks to expose an Iceberg REST API for external consumption Technical deep dive into data sharing, query performance, and access control across Databricks and Snowflake Lessons learned and best practices for building a multi-engine architecture while maintaining governance and efficiency By leveraging Uniform, Delta, and Iceberg, we created a flexible, vendor-agnostic architecture that bridges Databricks and Snowflake without compromising performance or security.

talk-data.com

Top Topics

Top Speakers

Site to Insight: Powering Construction Analytics Through Delta Sharing

Enabling Sleep Science Research With Databricks and Delta Sharing

From Datavault to Delta Lake: Streamlining Data Sync with Lakeflow Connect

Lakeflow Declarative Pipelines Integrations and Interoperability: Get Data From — and to — Anywhere

Toyota: Maximizing Business Value and Ensuring Data Privacy with Databricks in Connected Vehicles

Accelerating Model Development and Fine-Tuning on Databricks with TwelveLabs

Let's Save Tons of Money With Cloud-Native Data Ingestion!

Spark 4.0 and Delta 4.0 For Streaming Data

Unlocking the Future of Dairy Farming: Leveraging Data Marketplaces at Lely

AI Agents in Action: Structuring Unstructured Data on Demand With Databricks and Unstructured

Breaking Silos: Cigna’s Journey to Seamless Data Sharing with Delta Sharing

From Spaghetti Bowl Pipeline to Lakeflow Declarative Pipelines Efficiency

Petabyte-Scale On-Chain Insights: Real-Time Intelligence for the Next-Gen Financial Backbone

Sponsored by: SAP | SAP and Databricks Open a Bold New Era of Data and AI

Apache Iceberg with Unity Catalog at HelloFresh

How Danone Enhanced Global Data Sharing with Delta Sharing

Sponsored by: AVEVA | CONNECT and Databricks IT-OT Convergence for Industrial Intelligence at Scale

Delta Lake and the Data Mesh

From Metadata to Agents: Building the future of content understanding with Coactive AI + Databricks

Securing Data Collaboration: A Deep Dive Into Security, Frameworks, and Use Cases

Sponsored by: Amperity | Transforming Guest Experiences: GoTo Foods’ Data Journey with Amperity & Databricks

SQL-Based ETL: Options for SQL-Only Databricks Development

Transforming Financial Intelligence with FactSet Structured and Unstructured Data and Delta Sharing

Unlocking AI Value: Build AI Agents on SAP Data in Databricks

Breaking Silos: Enabling Databricks-Snowflake Interoperability With Iceberg and Unity Catalog