Data + AI Summit 2025

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

2025-06-12 Watch

lightning_talk

Ricky D'Sa (Addepar)

Data Lakehouse Databricks Delta

Addepar possesses an enormous private investment data set with 40% of the $7T assets on the platform allocated to alternatives. Leveraging the Addepar Data Lakehouse (ADL), built on Databricks, we have built a scalable data pipeline that assesses millions of private fund investment cash flows and translates it to a private fund benchmarks data offering. Investors on the Addepar platform can leverage this data seamlessly integrated against their portfolio investments and obtain actionable investment insights. At a high-level, this data offering consists of an extensive data aggregation, filtering, and construction logic that dynamically updates for clients through the Databricks job workflows. This derived dataset has gone through several iterations with investment strategists and academics that leveraged delta shared tables. Irrespective of the data source, the data pipeline coalesces all relevant cash flow activity against a unique identifier before constructing the benchmarks.

Sponsored by: C2S Technologies Inc. | Qbeast: Lakehouse Acceleration as a Service

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

2025-06-12 Watch

lightning_talk

Gianfranco Arena (Dxc Technology)

AWS Data Lakehouse Databricks Delta DWH

At DXC, we helped our customer FastWeb with their "Welcome Lakehouse" project - a data warehouse transformation from on-premises to Databricks on AWS. But the implementation became something more. Thanks to features such as Lakehouse Federation and Delta Sharing, from the first day of the Fastweb+Vodafone merger, we have been able to connect two different platforms with ease and make the business focus on the value of data and not on the IT integration. This session will feature our customer Alessandro Gattolin of Fastweb to talk about the experience.

Daft and Unity Catalog: A Multimodal/AI-Native Lakehouse

2025-06-12 Watch

talk

Jay Chia (Eventual)

AI/ML Analytics Big Data Data Analytics Data Lakehouse

Modern data organizations have moved beyond big data analytics to also incorporate advanced AI/ML data workloads. These workflows often involve multimodal datasets containing documents, images, long-form text, embeddings, URLs and more. Unity Catalog is an ideal solution for organizing and governing this data at scale. When paired with the Daft open source data engine, you can build a truly multimodal, AI-ready data lakehouse. In this session, we’ll explore how Daft integrates with Unity Catalog’s core features (such as volumes and functions) to enable efficient, AI-driven data lakehouses. You will learn how to ingest and process multimodal data (images, text and videos), run AI/ML transformations and feature extractions at scale, and maintain full control and visibility over your data with Unity Catalog’s fine-grained governance.

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

2025-06-12 Watch

talk

Jonathan Brito (Databricks)

AWS AWS Glue Data Lakehouse Databricks Delta Hive

Unity Catalog support for Apache Iceberg™ brings open, interoperable table formats to the heart of the Databricks Lakehouse. In this session, we’ll introduce new capabilities that allow you to write Iceberg tables from any REST-compatible engine, apply fine-grained governance across all data, and unify access to external Iceberg catalogs like AWS Glue, Hive Metastore, and Snowflake Horizon. Learn how Databricks is eliminating data silos, simplifying performance with Predictive Optimization, and advancing a truly open lakehouse architecture with Delta and Iceberg side by side.

Rust and Lakehouse Format — Ask Us Anything

2025-06-12

lightning_talk

Denny Lee (Databricks) , Robert Pack (Databricks) , Tyler Croy (Scribd, Inc.)

Data Lakehouse Delta Iceberg Rust

Join us for an in-depth Ask Me Anything (AMA) on how Rust is revolutionizing Lakehouse formats like Delta Lake and Apache Iceberg through projects like delta-rs and iceberg-rs! Discover how Rust’s memory safety, zero-cost abstractions and fearless concurrency unlock faster development and higher-performance data operations. Whether you’re a data engineer, Rustacean or Lakehouse enthusiast, bring your questions on how Rust is shaping the future of open table formats!

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

Eliminate Hops in Your Streaming Architecture with Zerobus, Part of Lakeflow Connect

2025-06-12

talk

Victoria Bukta (Databricks) , Nikola Obradovic (Databricks)

AI/ML Analytics API Data Lakehouse IoT Data Streaming

In this session, we’ll introduce Zerobus Direct Write API, part of Lakeflow Connect, which enables you to push data directly to your lakehouse and simplify ingestion for IOT, clickstreams, telemetry, and more. We’ll start with an overview of the ingestion landscape to date. Then, we'll cover how you can “shift left” with Zerobus, embedding data ingestion into your operational systems to make analytics and AI a core component of the business, rather than an afterthought. The result is a significantly simpler architecture that scales your operations, using this new paradigm to skip unnecessary hops. We'll also highlight one of our early customers, Joby Aviation and how they use Zerobus. Finally, we’ll provide a framework to help you understand when to use Zerobus versus other ingestion offerings—and we’ll wrap up with a live Q&A so that you can hit the ground running with your own use cases.

Sponsored by: Soda Data Inc. | Clean Energy, Clean Data: How Data Quality Powers Decarbonization

Building Responsible AI Agents on Databricks

2025-06-12 Watch

talk

Pavithra Rao (Databricks) , Yassine Essawabi (Databricks)

AI/ML BI Data Lakehouse Databricks LLM Cyber Security

This presentation explores how Databricks' Data Intelligence Platform supports the development and deployment of responsible AI in credit decisioning, ensuring fairness, transparency and regulatory compliance. Key areas include bias and fairness monitoring using Lakehouse Monitoring to track demographic metrics and automated alerts for fairness thresholds. Transparency and explainability are enhanced through the Mosaic AI Agent Framework, SHAP values and LIME for feature importance auditing. Regulatory alignment is achieved via Unity Catalog for data lineage and AIBI dashboards for compliance monitoring. Additionally, LLM reliability and security are ensured through AI guardrails and synthetic datasets to validate model outputs and prevent discriminatory patterns. The platform integrates real-time SME and user feedback via Databricks Apps and AI/BI Genie Space.

Get the Most of Your Delta Lake

2025-06-12 Watch

lightning_talk

Youssef Mrini (Databricks)

Analytics Data Lakehouse Data Management Delta Spark

Unlock the full potential of Delta Lake, the open-source storage framework for Apache Spark, with this session focused on its latest and most impactful features. Discover how capabilities like Time Travel, Column Mapping, Deletion Vectors, Liquid Clustering, UniForm interoperability, and Change Data Feed (CDF) can transform your data architecture. Learn not just what these features do, but when and how to use them to maximize performance, simplify data management, and enable advanced analytics across your lakehouse environment.

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

2025-06-12 Watch

talk

Tim Kessler (Redox, Inc.) , Matthew Giglia (Databricks)

AI/ML API Amazon EMR BI Data Lakehouse Databricks

Redox & Databricks direct integration can streamline your interoperability workflows from responding in record time to preauthorization requests to letting attending physicians know about a change in risk for sepsis and readmission in near real time from ADTs. Data engineers will learn how to create fully-streaming ETL pipelines for ingesting, parsing and acting on insights from Redox FHIR bundles delivered directly to Unity Catalog volumes. Once available in the Lakehouse, AI/BI Dashboards and Agentic Frameworks help write FHIR messages back to Redox for direct push down to EMR systems. Parsing FHIR bundle resources has never been easier with SQL combined with the new VARIANT data type in Delta and streaming table creation against Serverless DBSQL Warehouses. We'll also use Databricks accelerators dbignite and redoxwrite for writing and posting FHIR bundles back to Redox integrated EMRs and we'll extend AI/BI with Unity Catalog SQL UDFs and the Redox API for use in Genie.

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

2025-06-12 Watch

lightning_talk

Alex Dean (Snowplow)

AI/ML Analytics Data Collection Data Lakehouse Databricks Delta

In the competitive gaming industry, understanding player behavior is key to delivering engaging experiences. Supercell, creators of Clash of Clans and Brawl Stars, faced challenges with fragmented data and limited visibility into user journeys. To address this, they partnered with Snowplow and Databricks to build a scalable, privacy-compliant data platform for real-time insights. By leveraging Snowplow’s behavioral data collection and Databricks’ Lakehouse architecture, Supercell achieved: Cross-platform data unification: A unified view of player actions across web, mobile and in-game Real-time analytics: Streaming event data into Delta Lake for dynamic game balancing and engagement Scalable infrastructure: Supporting terabytes of data during launches and live events AI & ML use cases: Churn prediction and personalized in-game recommendations This session explores Supercell’s data journey and AI-driven player engagement strategies.

ClickHouse and Databricks for Real-Time Analytics

2025-06-11 Watch

talk

Melvyn Peignon (ClickHouse)

Analytics ClickHouse Data Lakehouse Databricks Delta

ClickHouse is a C++ based, column-oriented database built for real-time analytics. While it has its own internal storage format, the rise of open lakehouse architectures has created a growing need for seamless interoperability. In response, we have developed integrations with your favorite lakehouse ecosystem to enhance compatibility, performance and governance. From integrating with Unity Catalog to embedding the Delta Kernel into ClickHouse, this session will explore the key design considerations behind these integrations, their benefits to the community, the lessons learned and future opportunities for improved compatibility and seamless integration.

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

2025-06-11 Watch

talk

Satish Karunakaran (Robert Bosch GmbH) , Marc-Alexander Frey (Robert Bosch GmbH)

Data Lakehouse Databricks dbt LLM

This session will showcase Bosch’s journey in consolidating supply chain information using the Databricks platform. It will dive into how Databricks not only acts as the central data lakehouse but also integrates seamlessly with transformative components such as dbt and Large Language Models (LLMs). The talk will highlight best practices, architectural considerations, and the value of an interoperable platform in driving actionable insights and operational excellence across complex supply chain processes. Key Topics and Sections Introduction & Business Context Brief Overview of Bosch’s Supply Chain Challenges and the Need for a Consolidated Data Platform. Strategic Importance of Data-Driven Decision-Making in a Global Supply Chain Environment. Databricks as the Core Data Platform Integrating dbt for Transformation Leveraging LLM Models for Enhanced Insights

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

2025-06-11 Watch

talk

Tathagata Das (Databricks) , Michelle Leon (Databricks)

Flink API Data Lakehouse DuckDB Iceberg Cyber Security

The lakehouse is built for storage flexibility, but what about compute? In this session, we’ll explore how Unity Catalog enables you to connect and govern multiple compute engines across your data ecosystem. With open APIs and support for the Iceberg REST Catalog, UC lets you extend access to engines like Trino, DuckDB, and Flink while maintaining centralized security, lineage, and interoperability. We will show how you can get started today working with engines like Apache Spark and Starburst to read and write to UC managed tables with some exciting demos. Learn how to bring flexibility to your compute layer—without compromising control.

Hands-on Learning: Databricks SQL in Action: Intelligent Data Warehousing, Analytics and BI Workshop (repeat)

2025-06-11

workshop

Pearl Ubaru (Databricks)

AI/ML Analytics BI Cloud Computing Data Lake Data Lakehouse

Most organizations run complex cloud data architectures that silo applications, users and data. Join this interactive hands-on workshop to learn how Databricks SQL allows you to operate a multi-cloud lakehouse architecture that delivers data warehouse performance at data lake economics — with up to 12x better price/performance than traditional cloud data warehouses. Here’s what we’ll cover: How Databricks SQL fits in the Data Intelligence Platform, enabling you to operate a multicloud lakehouse architecture that delivers data warehouse performance at data lake economics How to manage and monitor compute resources, data access and users across your lakehouse infrastructure How to query directly on your data lake using your tools of choice or the built-in SQL editor and visualizations How to use AI to increase productivity when querying, completing code or building dashboards Ask your questions during this hands-on lab, and the Databricks experts will guide you.

HP's Data Platform Migration Journey: Redshift to Lakehouse

2025-06-11 Watch

talk

Isaac Chan (HP Inc.) , Kavya Atmakuri (HP Inc.)

AWS Data Lakehouse Databricks dbt ETL/ELT Redshift

HP Print's data platform team took on a migration from a monolithic, shared resource of AWS Redshift, to a modular and scalable data ecosystem on Databricks lakehouse. The result was 30–40% cost savings, scalable and isolated resources for different data consumers and ETL workloads, and performance optimization for a variety of query types. Through this migration, there were technical challenges and learnings relating to the ETL migrations with DBT, new Databricks features like Liquid Clustering, predictive optimization, Photon, SQL serverless warehouses, managing multiple teams on Unity Catalog, and others. This presentation dives into both the business and technical sides of this migration. Come along as we share our key takeaways from this journey.

Summit Live: OLTP for the Lakehouse

2025-06-11 Watch

talk

Dave Nettleton (Databricks)

AI/ML Data Lakehouse Databricks

Analytical and operational use cases are starting to converge, and AI-assisted applications are accelerating the trend. Most applications require a transactional, OLTP database to power data. Hear from a Databricks expert on the latest developments and our strategy for operational data integrated into the lakehouse.

How Serverless Empowered Nationwide to Build Cost-Efficient and World Class BI

2025-06-11 Watch

lightning_talk

Ananya Ghosh (Nationwide)

AI/ML BI Data Lakehouse Databricks

Databricks’ Serverless compute streamlines infrastructure setup and management, delivering unparalleled performance and cost optimization for Data and BI workflows. In this presentation, we will explore how Nationwide is leveraging Databricks’ serverless technology and unified governance through Unity Catalog to build scalable, world-class BI solutions. Key features like AI/BI Dashboards, Genie, Materialized Views, Lakehouse Federation and Lakehouse Apps, all powered by serverless, have empowered business teams to deliver faster, scalable and smarter insights. We will show how Databricks’ serverless technology is enabling Nationwide to unlock new levels of efficiency and business impact, and how other organizations can adopt serverless technology to realize similar benefits.

Developing the Dreamers of Data + AI’s Future: How 84.51˚ builds upskilling to accelerate adoption

2025-06-11 Watch

talk

Michael Carrico (84.51)

AI/ML Data Lakehouse Data Science Databricks

“Once an idea has taken hold of the brain it's almost impossible to eradicate. An idea that is fully formed — fully understood — that sticks, right in there somewhere.” The Data Scientists and Engineers at 84.51˚ utilize the Databricks Lakehouse for a wide array of tasks, including data exploration, analysis, machine learning operations, orchestration, automated deployments and collaboration. In this talk, 84.51˚’s Data Science Learning Lead, Michael Carrico, will share their approach to upskilling a diverse workforce to support the company’s strategic initiatives. This approach includes creating tailored learning experiences for a variety of personas using content curated in partnership with Databricks’ educational offerings. Then he will demonstrate how he puts his 11 years of data science and engineering experience to work by using the Databricks Lakehouse not just as a subject, but also as a tool to create impactful training experiences and a learning culture at 84.51˚.

Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse Platform

2025-06-11 Watch

talk

Ruotian Wang (Doordash) , Sergey Zavgorodni (DoorDash)

Amazon EMR Data Lake Data Lakehouse Databricks DWH Iceberg

DoorDash Data organization actively adopts LakeHouse paradigm. This presentation describes the methodology which allows to migrate the classic Data Warehouse and Data Lake platforms to unified LakeHouse solution.The objective of this effort include Elimination of excessive data movement.Seamless integration and consolidation of the query engine layers, including Snowflake, Databricks, EMR and Trino.Query performance optimization.Abstracting away complexity of underlying storage layers and table formatsStrategic and justified decision on the Unified Metadata catalog used across varios compute platforms

Multi-Statement Transactions: How to Improve Data Consistency and Performance

2025-06-11 Watch

lightning_talk

Franco Patano (Databricks)

Data Lakehouse Databricks DWH SQL

Multi-statement transactions bring the atomicity and reliability of traditional databases to modern data warehousing on the lakehouse. In this session, we’ll explore real-world patterns enabled by multi-statement transactions — including multi-table updates, deduplication pipelines and audit logging — and show how Databricks ensures atomicity and consistency across complex workflows. We’ll also dive into demos and share tips to getting started and migrations with this feature in Databricks SQL.

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI

talk-data.com

Top Topics

Top Speakers

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

Sponsored by: C2S Technologies Inc. | Qbeast: Lakehouse Acceleration as a Service

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

Daft and Unity Catalog: A Multimodal/AI-Native Lakehouse

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

Rust and Lakehouse Format — Ask Us Anything

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

Sponsored by: definity | How You Could Be Saving 50% of Your Spark Costs

Eliminate Hops in Your Streaming Architecture with Zerobus, Part of Lakeflow Connect

Sponsored by: Soda Data Inc. | Clean Energy, Clean Data: How Data Quality Powers Decarbonization

Building Responsible AI Agents on Databricks

Get the Most of Your Delta Lake

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

ClickHouse and Databricks for Real-Time Analytics

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

Hands-on Learning: Databricks SQL in Action: Intelligent Data Warehousing, Analytics and BI Workshop (repeat)

HP's Data Platform Migration Journey: Redshift to Lakehouse

Summit Live: OLTP for the Lakehouse

How Serverless Empowered Nationwide to Build Cost-Efficient and World Class BI

Developing the Dreamers of Data + AI’s Future: How 84.51˚ builds upskilling to accelerate adoption

Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse Platform

Multi-Statement Transactions: How to Improve Data Consistency and Performance

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI