Delta

From Data Lake Entanglement to Data Mesh Decoupling: Scaling a Self-Service Data Platform

2025-12-09 · PyData Eindhoven 2025 Watch

talk

by Geert Jongen

Data Lake PySpark

Our data platform journey started with a classic data lake — easy to ingest, hard to evolve. As domains scaled, tight coupling across source systems, pipelines, and data products slowed everything down. In this talk, we share how we re-architected toward a domain-oriented data mesh using PySpark, Delta Lake and DQX to achieve true decoupling. Expect practical lessons on designing independent data products, managing lineage and governance, and scaling self-service without chaos.

Unleashing SAP Databricks on Azure: Modernize, analyze, and innovate

2025-11-18 · Microsoft Ignite 2025 Watch

breakout

by Shanku Niyogi (Databricks) , Anavi Nahar (Microsoft)

AI/ML Analytics Azure BI Cloud Computing Databricks SAP

SAP Databricks on Azure integrates Databricks Data Intelligence Platform with SAP Business Data Cloud, unifying SAP and external data for advanced analytics, AI, and ML. It enables building intelligent apps and actionable insights using trusted SAP and third-party business data. Available natively on Azure within SAP Business Data Cloud, it offers seamless access without data duplication via Delta Sharing. This session highlights automated forecasting, exploratory analysis, and BI use cases.

You Don’t Need Spark for That: Pythonic Data Lakehouse Workflows

2025-09-30 · PyData Paris 2025 Watch

talk

by Romain Clement

Data Lakehouse Python Spark

Have you ever spun up a Spark cluster just to update three rows in a Delta table? In this talk, we’ll explore how modern Python libraries can power lightweight, production-grade Data Lakehouse workflows—helping you avoid over-engineering your data stack.

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Ricky D'Sa (Addepar)

Data Lakehouse Databricks

Addepar possesses an enormous private investment data set with 40% of the $7T assets on the platform allocated to alternatives. Leveraging the Addepar Data Lakehouse (ADL), built on Databricks, we have built a scalable data pipeline that assesses millions of private fund investment cash flows and translates it to a private fund benchmarks data offering. Investors on the Addepar platform can leverage this data seamlessly integrated against their portfolio investments and obtain actionable investment insights. At a high-level, this data offering consists of an extensive data aggregation, filtering, and construction logic that dynamically updates for clients through the Databricks job workflows. This derived dataset has gone through several iterations with investment strategists and academics that leveraged delta shared tables. Irrespective of the data source, the data pipeline coalesces all relevant cash flow activity against a unique identifier before constructing the benchmarks.

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Gianfranco Arena (Dxc Technology)

AWS Data Lakehouse Databricks DWH

At DXC, we helped our customer FastWeb with their "Welcome Lakehouse" project - a data warehouse transformation from on-premises to Databricks on AWS. But the implementation became something more. Thanks to features such as Lakehouse Federation and Delta Sharing, from the first day of the Fastweb+Vodafone merger, we have been able to connect two different platforms with ease and make the business focus on the value of data and not on the IT integration. This session will feature our customer Alessandro Gattolin of Fastweb to talk about the experience.

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Jonathan Brito (Databricks)

AWS AWS Glue Data Lakehouse Databricks Hive Iceberg Snowflake

Unity Catalog support for Apache Iceberg™ brings open, interoperable table formats to the heart of the Databricks Lakehouse. In this session, we’ll introduce new capabilities that allow you to write Iceberg tables from any REST-compatible engine, apply fine-grained governance across all data, and unify access to external Iceberg catalogs like AWS Glue, Hive Metastore, and Snowflake Horizon. Learn how Databricks is eliminating data silos, simplifying performance with Predictive Optimization, and advancing a truly open lakehouse architecture with Delta and Iceberg side by side.

Supercharging Sales Intelligence: Processing Billions of Events via Structured Streaming

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Anurag Bharati (DigiCert) , Nikita Raje (DigiCert)

API Databricks Cyber Security Spark Data Streaming

DigiCert is a digital security company that provides digital certificates, encryption and authentication services and serves 88% of the Fortune 500, securing over 28 billion web connections daily. Our project aggregates and analyzes certificate transparency logs via public APIs to provide comprehensive market and competitive intelligence. Instead of relying on third-party providers with limited data, our project gives full control, deeper insights and automation. Databricks has helped us reliably poll public APIs in a scalable manner that fetches millions of events daily, deduplicate and store them in our Delta tables. We specifically use Spark for parallel processing, structured streaming for real-time ingestion and deduplication, Delta tables for data reliability, pools and jobs to ensure our costs are optimized. These technologies help us keep our data fresh, accurate and cost effective. This data has helped our sales team with real-time intelligence, ensuring DigiCert's success.

Delta Lake Liquid Clustering: Lightning-Fast Queries on Massive Datasets

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Rahul Mahadev (Databricks) , Cindy Jiang (Databricks)

In this presentation, we’ll dive into the power of Liquid Clustering—an innovative, out-of-the-box solution that automatically tunes your data layout to scale effortlessly with your datasets. You’ll get a deep look at how Liquid Clustering works, along with real-world examples of customers leveraging it to unlock blazing-fast query performance on petabyte-scale datasets. We’ll also give you an exciting sneak peek into the roadmap ahead, with upcoming features and enhancements to come.

Tech Industry Session: Building Collaborative Ecosystems With Openness and Portability

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Matthew Houser (Tealium) , Bob Pisani (Addepar) , Adrian Bolosan (Databricks) , Davis Matson (Health Catalyst)

AI/ML Analytics Databricks

Join us to discover how leading tech companies accelerate growth using open ecosystems and built-on solutions to foster collaboration, accelerate innovation and create scalable data products. This session will explore how organizations use Databricks to securely share data, integrate with partners and enable teams to build impactful applications powered by AI and analytics. Topics include: Using Delta Sharing for secure, real-time data collaboration across teams and partners Embedding analytics and creating marketplaces to extend product capabilities Building with open standards and governance frameworks to ensure compliance without sacrificing agility Hear real-world examples of how open ecosystems empower organizations to widen the aperture on collaboration, driving better business outcomes. Walk away with insights into how open data sharing and built-on solutions can help your teams innovate faster at scale.

The Future of Open Table Formats: Delta Lake, Iceberg, and More

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Daniel Weeks (Databricks) , Ryan Blue (Tabular)

Iceberg

Open table formats are evolving quickly. In this session, we’ll explore the latest features of Delta Lake and Apache Iceberg™ , including a look at the emerging Iceberg v3 specification. Join us to learn about what’s driving format innovation, how interoperability is becoming real, and what it means for the future of data architecture.

Sponsored by: MathCo | Powering Contextualized Intelligence with NucliOS, MathCo’s Databricks-Native Platform

Founder discussion: Matei on UC, Data Intelligence and AI Governance

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Matei Zaharia (Databricks)

AI/ML Databricks LLM Spark

Matei is a legend of open source: he started the Apache Spark project in 2009, co-founded Databricks, and worked on other widely used data and AI software, including MLflow, Delta Lake, and Dolly. His most recent research is about combining large language models (LLMs) with external data sources, such as search systems, and improving their efficiency and result quality. This will be a conversation coverering the latest and greatest of UC, Data Intelligence, AI Governance, and more.

Summit Live: Data Sharing and Collaboration

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Zaheera Valani (Databricks)

AI/ML Analytics Databricks

Hear more on the latest in data collaboration, which is paramount to unlocking business success. Delta Sharing is an open-source approach to share and govern data, AI models, dashboards, and notebooks across clouds and platforms - without the costly need for replication. Databricks Clean Rooms provide safe hosting environments for data collaboration across companies, also without the costly duplication of data. And the Databricks Marketplace is the open marketplace for all your data, analytics, and AI needs.

Better Together: Change Data Feed in a Streaming Data Flow

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Mattias Moser (84.51 LLC) , Scott Gordon (84.51˚)

Marketing Data Streaming

Traditional streaming works great when your data source is append-only, but what if your data source includes updates and deletes? At 84.51 we used Lakeflow Declarative Pipelines and Delta Lake to build a streaming data flow that consumes inserts, updates and deletes while still taking advantage of streaming checkpoints. We combined this flow with a materialized view and Enzyme incremental refresh for a low-code, efficient and robust end-to-end data flow.We process around 8 million sales transactions each day with 80 million items purchased. This flow not only handles new transactions but also handles updates to previous transactions.Join us to learn how 84.51 combined change data feed, data streaming and materialized views to deliver a “better together” solution.84.51 is a retail insights, media & marketing company. We use first-party retail data from 60 million households sourced through a loyalty card program to drive Kroger’s customer-centric journey.

Collaborative Innovation: How to Spur Innovation While Driving Efficiency

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Anthony Meyers (Aon) , Paul McComish (Allianz) , Antoine Amend (Databricks)

Databricks

Collaboration is redefining efficiency in insurance. This session explores how technologies such as Databricks Delta Sharing, secure data clean rooms, and data marketplaces are empowering insurers to securely share and analyze data across organizational boundaries—without exposing sensitive information. Discover how these solutions streamline operations, enhance risk modeling with real-time data integration, and enable the creation of tailored products through multi-party collaboration. Learn how insurers are leveraging these collaborative data ecosystems to reduce costs, drive innovation, and deliver better customer outcomes, all while maintaining strong privacy and governance standards. Join us to see how embracing collaborative frameworks is helping insurers operate smarter, faster, and more efficiently.

Embracing Unity Catalog and Empowering Innovation With Genie Room

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Junghoon Lee (Bagelcode) , Soochang Chung (Bagelcode)

Databricks ETL/ELT Hive

Bagelcode, a leader in the social casino industry, has utilized Databricks since 2018 and manages over 10,000 tables via Hive Metastore. In 2024, we embarked on a transformative journey to resolve inefficiencies and unlock new capabilities. Over five months, we redesigned ETL pipelines with Delta Lake, optimized partitioned table logs and executed a seamless migration with minimal disruption. This effort improved governance, simplified management and unlocked Unity Catalog’s advanced features. Post-migration, we integrated the Genie Room with Slack to enable natural language queries, accelerating decision-making and operational efficiency. Additionally, a lineage-powered internal tool allowed us to quickly identify and resolve issues like backfill needs or data contamination. Unity Catalog has revolutionized our data ecosystem, elevating governance and innovation. Join us to learn how Bagelcode unlocked its data’s full potential and discover strategies for your own transformation.

Get the Most of Your Delta Lake

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Youssef Mrini (Databricks)

Analytics Data Lakehouse Data Management Spark

Unlock the full potential of Delta Lake, the open-source storage framework for Apache Spark, with this session focused on its latest and most impactful features. Discover how capabilities like Time Travel, Column Mapping, Deletion Vectors, Liquid Clustering, UniForm interoperability, and Change Data Feed (CDF) can transform your data architecture. Learn not just what these features do, but when and how to use them to maximize performance, simplify data management, and enable advanced analytics across your lakehouse environment.

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Tim Kessler (Redox, Inc.) , Matthew Giglia (Databricks)

AI/ML API Amazon EMR BI Data Lakehouse Databricks ETL/ELT SQL Data Streaming

Redox & Databricks direct integration can streamline your interoperability workflows from responding in record time to preauthorization requests to letting attending physicians know about a change in risk for sepsis and readmission in near real time from ADTs. Data engineers will learn how to create fully-streaming ETL pipelines for ingesting, parsing and acting on insights from Redox FHIR bundles delivered directly to Unity Catalog volumes. Once available in the Lakehouse, AI/BI Dashboards and Agentic Frameworks help write FHIR messages back to Redox for direct push down to EMR systems. Parsing FHIR bundle resources has never been easier with SQL combined with the new VARIANT data type in Delta and streaming table creation against Serverless DBSQL Warehouses. We'll also use Databricks accelerators dbignite and redoxwrite for writing and posting FHIR bundles back to Redox integrated EMRs and we'll extend AI/BI with Unity Catalog SQL UDFs and the Redox API for use in Genie.

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Alex Dean (Snowplow)

AI/ML Analytics Data Collection Data Lakehouse Databricks Snowplow Data Streaming

In the competitive gaming industry, understanding player behavior is key to delivering engaging experiences. Supercell, creators of Clash of Clans and Brawl Stars, faced challenges with fragmented data and limited visibility into user journeys. To address this, they partnered with Snowplow and Databricks to build a scalable, privacy-compliant data platform for real-time insights. By leveraging Snowplow’s behavioral data collection and Databricks’ Lakehouse architecture, Supercell achieved: Cross-platform data unification: A unified view of player actions across web, mobile and in-game Real-time analytics: Streaming event data into Delta Lake for dynamic game balancing and engagement Scalable infrastructure: Supporting terabytes of data during launches and live events AI & ML use cases: Churn prediction and personalized in-game recommendations This session explores Supercell’s data journey and AI-driven player engagement strategies.

Optimizing EV Charging Experience: Machine Learning for Accurate Charge Time Estimation

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Sihang Chen (Rivian) , Mohammed Farag (Rivian Automotive, LLC)

AI/ML Data Governance

Accurate charge time estimation is key to vehicle performance and user experience. We developed a scalable ML model that enhances real-time charge predictions in vehicle controls. Traditional rule-based methods struggle with dynamic factors like environment, vehicle state, and charging conditions. Our adaptive ML solution improves accuracy by 10%. We use Unity Catalog for data governance, Delta Tables for storage, and Liquid Clustering for data layout. Job schedulers manage data processing, while AutoML accelerates model selection. MLflow streamlines tracking, versioning, and deployment. A dedicated serving endpoint enables A/B testing and real-time insights. As our data ecosystem grew, scalability became critical. Our flexible ML framework was integrated into vehicle control systems within months. With live accuracy tracking and software-driven blending, we support 50,000+ weekly charge sessions, improving energy management and user experience.

talk-data.com

Activity Trend

Top Events

Top Speakers

From Data Lake Entanglement to Data Mesh Decoupling: Scaling a Self-Service Data Platform

Unleashing SAP Databricks on Azure: Modernize, analyze, and innovate

You Don’t Need Spark for That: Pythonic Data Lakehouse Workflows

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

Supercharging Sales Intelligence: Processing Billions of Events via Structured Streaming

Delta Lake Liquid Clustering: Lightning-Fast Queries on Massive Datasets

Tech Industry Session: Building Collaborative Ecosystems With Openness and Portability

The Future of Open Table Formats: Delta Lake, Iceberg, and More

Sponsored by: MathCo | Powering Contextualized Intelligence with NucliOS, MathCo’s Databricks-Native Platform

Founder discussion: Matei on UC, Data Intelligence and AI Governance

Summit Live: Data Sharing and Collaboration

Better Together: Change Data Feed in a Streaming Data Flow

Collaborative Innovation: How to Spur Innovation While Driving Efficiency

Embracing Unity Catalog and Empowering Innovation With Genie Room

Get the Most of Your Delta Lake

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

Optimizing EV Charging Experience: Machine Learning for Accurate Charge Time Estimation