Data Lakehouse

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Ricky D'Sa (Addepar)

Databricks Delta

Addepar possesses an enormous private investment data set with 40% of the $7T assets on the platform allocated to alternatives. Leveraging the Addepar Data Lakehouse (ADL), built on Databricks, we have built a scalable data pipeline that assesses millions of private fund investment cash flows and translates it to a private fund benchmarks data offering. Investors on the Addepar platform can leverage this data seamlessly integrated against their portfolio investments and obtain actionable investment insights. At a high-level, this data offering consists of an extensive data aggregation, filtering, and construction logic that dynamically updates for clients through the Databricks job workflows. This derived dataset has gone through several iterations with investment strategists and academics that leveraged delta shared tables. Irrespective of the data source, the data pipeline coalesces all relevant cash flow activity against a unique identifier before constructing the benchmarks.

Sponsored by: C2S Technologies Inc. | Qbeast: Lakehouse Acceleration as a Service

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Gianfranco Arena (Dxc Technology)

AWS Databricks Delta DWH

At DXC, we helped our customer FastWeb with their "Welcome Lakehouse" project - a data warehouse transformation from on-premises to Databricks on AWS. But the implementation became something more. Thanks to features such as Lakehouse Federation and Delta Sharing, from the first day of the Fastweb+Vodafone merger, we have been able to connect two different platforms with ease and make the business focus on the value of data and not on the IT integration. This session will feature our customer Alessandro Gattolin of Fastweb to talk about the experience.

Daft and Unity Catalog: A Multimodal/AI-Native Lakehouse

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Jay Chia (Eventual)

AI/ML Analytics Big Data Data Analytics

Modern data organizations have moved beyond big data analytics to also incorporate advanced AI/ML data workloads. These workflows often involve multimodal datasets containing documents, images, long-form text, embeddings, URLs and more. Unity Catalog is an ideal solution for organizing and governing this data at scale. When paired with the Daft open source data engine, you can build a truly multimodal, AI-ready data lakehouse. In this session, we’ll explore how Daft integrates with Unity Catalog’s core features (such as volumes and functions) to enable efficient, AI-driven data lakehouses. You will learn how to ingest and process multimodal data (images, text and videos), run AI/ML transformations and feature extractions at scale, and maintain full control and visibility over your data with Unity Catalog’s fine-grained governance.

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Jonathan Brito (Databricks)

AWS AWS Glue Databricks Delta Hive Iceberg Snowflake

Unity Catalog support for Apache Iceberg™ brings open, interoperable table formats to the heart of the Databricks Lakehouse. In this session, we’ll introduce new capabilities that allow you to write Iceberg tables from any REST-compatible engine, apply fine-grained governance across all data, and unify access to external Iceberg catalogs like AWS Glue, Hive Metastore, and Snowflake Horizon. Learn how Databricks is eliminating data silos, simplifying performance with Predictive Optimization, and advancing a truly open lakehouse architecture with Delta and Iceberg side by side.

Rust and Lakehouse Format — Ask Us Anything

2025-06-12 · Data + AI Summit 2025

lightning_talk

by Robert Pack (Databricks) , Denny Lee (Databricks) , Tyler Croy (Scribd, Inc.)

Delta Iceberg Rust

Join us for an in-depth Ask Me Anything (AMA) on how Rust is revolutionizing Lakehouse formats like Delta Lake and Apache Iceberg through projects like delta-rs and iceberg-rs! Discover how Rust’s memory safety, zero-cost abstractions and fearless concurrency unlock faster development and higher-performance data operations. Whether you’re a data engineer, Rustacean or Lakehouse enthusiast, bring your questions on how Rust is shaping the future of open table formats!

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

Eliminate Hops in Your Streaming Architecture with Zerobus, Part of Lakeflow Connect

2025-06-12 · Data + AI Summit 2025

talk

by Victoria Bukta (Shopify) , Nikola Obradovic (Databricks)

AI/ML Analytics API IoT Data Streaming

In this session, we’ll introduce Zerobus Direct Write API, part of Lakeflow Connect, which enables you to push data directly to your lakehouse and simplify ingestion for IOT, clickstreams, telemetry, and more. We’ll start with an overview of the ingestion landscape to date. Then, we'll cover how you can “shift left” with Zerobus, embedding data ingestion into your operational systems to make analytics and AI a core component of the business, rather than an afterthought. The result is a significantly simpler architecture that scales your operations, using this new paradigm to skip unnecessary hops. We'll also highlight one of our early customers, Joby Aviation and how they use Zerobus. Finally, we’ll provide a framework to help you understand when to use Zerobus versus other ingestion offerings—and we’ll wrap up with a live Q&A so that you can hit the ground running with your own use cases.

Sponsored by: Soda Data Inc. | Clean Energy, Clean Data: How Data Quality Powers Decarbonization

Building Responsible AI Agents on Databricks

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Pavithra Rao (Databricks) , Yassine Essawabi (Databricks)

AI/ML BI Databricks LLM Cyber Security

This presentation explores how Databricks' Data Intelligence Platform supports the development and deployment of responsible AI in credit decisioning, ensuring fairness, transparency and regulatory compliance. Key areas include bias and fairness monitoring using Lakehouse Monitoring to track demographic metrics and automated alerts for fairness thresholds. Transparency and explainability are enhanced through the Mosaic AI Agent Framework, SHAP values and LIME for feature importance auditing. Regulatory alignment is achieved via Unity Catalog for data lineage and AIBI dashboards for compliance monitoring. Additionally, LLM reliability and security are ensured through AI guardrails and synthetic datasets to validate model outputs and prevent discriminatory patterns. The platform integrates real-time SME and user feedback via Databricks Apps and AI/BI Genie Space.

Get the Most of Your Delta Lake

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Youssef Mrini (Databricks)

Analytics Data Management Delta Spark

Unlock the full potential of Delta Lake, the open-source storage framework for Apache Spark, with this session focused on its latest and most impactful features. Discover how capabilities like Time Travel, Column Mapping, Deletion Vectors, Liquid Clustering, UniForm interoperability, and Change Data Feed (CDF) can transform your data architecture. Learn not just what these features do, but when and how to use them to maximize performance, simplify data management, and enable advanced analytics across your lakehouse environment.

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Tim Kessler (Redox, Inc.) , Matthew Giglia (Databricks)

AI/ML API Amazon EMR BI Databricks Delta ETL/ELT SQL Data Streaming

Redox & Databricks direct integration can streamline your interoperability workflows from responding in record time to preauthorization requests to letting attending physicians know about a change in risk for sepsis and readmission in near real time from ADTs. Data engineers will learn how to create fully-streaming ETL pipelines for ingesting, parsing and acting on insights from Redox FHIR bundles delivered directly to Unity Catalog volumes. Once available in the Lakehouse, AI/BI Dashboards and Agentic Frameworks help write FHIR messages back to Redox for direct push down to EMR systems. Parsing FHIR bundle resources has never been easier with SQL combined with the new VARIANT data type in Delta and streaming table creation against Serverless DBSQL Warehouses. We'll also use Databricks accelerators dbignite and redoxwrite for writing and posting FHIR bundles back to Redox integrated EMRs and we'll extend AI/BI with Unity Catalog SQL UDFs and the Redox API for use in Genie.

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Alex Dean (Snowplow)

AI/ML Analytics Data Collection Databricks Delta Snowplow Data Streaming

In the competitive gaming industry, understanding player behavior is key to delivering engaging experiences. Supercell, creators of Clash of Clans and Brawl Stars, faced challenges with fragmented data and limited visibility into user journeys. To address this, they partnered with Snowplow and Databricks to build a scalable, privacy-compliant data platform for real-time insights. By leveraging Snowplow’s behavioral data collection and Databricks’ Lakehouse architecture, Supercell achieved: Cross-platform data unification: A unified view of player actions across web, mobile and in-game Real-time analytics: Streaming event data into Delta Lake for dynamic game balancing and engagement Scalable infrastructure: Supporting terabytes of data during launches and live events AI & ML use cases: Churn prediction and personalized in-game recommendations This session explores Supercell’s data journey and AI-driven player engagement strategies.

ClickHouse and Databricks for Real-Time Analytics

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Melvyn Peignon (ClickHouse)

Analytics ClickHouse Databricks Delta

ClickHouse is a C++ based, column-oriented database built for real-time analytics. While it has its own internal storage format, the rise of open lakehouse architectures has created a growing need for seamless interoperability. In response, we have developed integrations with your favorite lakehouse ecosystem to enhance compatibility, performance and governance. From integrating with Unity Catalog to embedding the Delta Kernel into ClickHouse, this session will explore the key design considerations behind these integrations, their benefits to the community, the lessons learned and future opportunities for improved compatibility and seamless integration.

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Satish Karunakaran (Robert Bosch GmbH) , Marc-Alexander Frey (Robert Bosch GmbH)

Databricks dbt LLM

This session will showcase Bosch’s journey in consolidating supply chain information using the Databricks platform. It will dive into how Databricks not only acts as the central data lakehouse but also integrates seamlessly with transformative components such as dbt and Large Language Models (LLMs). The talk will highlight best practices, architectural considerations, and the value of an interoperable platform in driving actionable insights and operational excellence across complex supply chain processes. Key Topics and Sections Introduction & Business Context Brief Overview of Bosch’s Supply Chain Challenges and the Need for a Consolidated Data Platform. Strategic Importance of Data-Driven Decision-Making in a Global Supply Chain Environment. Databricks as the Core Data Platform Integrating dbt for Transformation Leveraging LLM Models for Enhanced Insights

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Tathagata Das (Databricks) , Michelle Leon (Databricks)

Flink API DuckDB Iceberg Cyber Security Spark Trino

The lakehouse is built for storage flexibility, but what about compute? In this session, we’ll explore how Unity Catalog enables you to connect and govern multiple compute engines across your data ecosystem. With open APIs and support for the Iceberg REST Catalog, UC lets you extend access to engines like Trino, DuckDB, and Flink while maintaining centralized security, lineage, and interoperability. We will show how you can get started today working with engines like Apache Spark and Starburst to read and write to UC managed tables with some exciting demos. Learn how to bring flexibility to your compute layer—without compromising control.

Hands-on Learning: Databricks SQL in Action: Intelligent Data Warehousing, Analytics and BI Workshop (repeat)

2025-06-11 · Data + AI Summit 2025

workshop

by Pearl Ubaru (Databricks)

AI/ML Analytics BI Cloud Computing Data Lake Databricks DWH SQL

Most organizations run complex cloud data architectures that silo applications, users and data. Join this interactive hands-on workshop to learn how Databricks SQL allows you to operate a multi-cloud lakehouse architecture that delivers data warehouse performance at data lake economics — with up to 12x better price/performance than traditional cloud data warehouses. Here’s what we’ll cover: How Databricks SQL fits in the Data Intelligence Platform, enabling you to operate a multicloud lakehouse architecture that delivers data warehouse performance at data lake economics How to manage and monitor compute resources, data access and users across your lakehouse infrastructure How to query directly on your data lake using your tools of choice or the built-in SQL editor and visualizations How to use AI to increase productivity when querying, completing code or building dashboards Ask your questions during this hands-on lab, and the Databricks experts will guide you.

HP's Data Platform Migration Journey: Redshift to Lakehouse

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Isaac Chan (HP Inc.) , Kavya Atmakuri (HP Inc.)

AWS Databricks dbt ETL/ELT Redshift SQL

HP Print's data platform team took on a migration from a monolithic, shared resource of AWS Redshift, to a modular and scalable data ecosystem on Databricks lakehouse. The result was 30–40% cost savings, scalable and isolated resources for different data consumers and ETL workloads, and performance optimization for a variety of query types. Through this migration, there were technical challenges and learnings relating to the ETL migrations with DBT, new Databricks features like Liquid Clustering, predictive optimization, Photon, SQL serverless warehouses, managing multiple teams on Unity Catalog, and others. This presentation dives into both the business and technical sides of this migration. Come along as we share our key takeaways from this journey.

Summit Live: OLTP for the Lakehouse

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Dave Nettleton (Databricks)

AI/ML Databricks

Analytical and operational use cases are starting to converge, and AI-assisted applications are accelerating the trend. Most applications require a transactional, OLTP database to power data. Hear from a Databricks expert on the latest developments and our strategy for operational data integrated into the lakehouse.

talk-data.com

Activity Trend

Top Events

Top Speakers

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

Sponsored by: C2S Technologies Inc. | Qbeast: Lakehouse Acceleration as a Service

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

Daft and Unity Catalog: A Multimodal/AI-Native Lakehouse

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

Rust and Lakehouse Format — Ask Us Anything

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

Sponsored by: definity | How You Could Be Saving 50% of Your Spark Costs

Eliminate Hops in Your Streaming Architecture with Zerobus, Part of Lakeflow Connect

Sponsored by: Soda Data Inc. | Clean Energy, Clean Data: How Data Quality Powers Decarbonization

Building Responsible AI Agents on Databricks

Get the Most of Your Delta Lake

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

ClickHouse and Databricks for Real-Time Analytics

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

Hands-on Learning: Databricks SQL in Action: Intelligent Data Warehousing, Analytics and BI Workshop (repeat)

HP's Data Platform Migration Journey: Redshift to Lakehouse

Summit Live: OLTP for the Lakehouse