Iceberg

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Jonathan Brito (Databricks)

AWS AWS Glue Data Lakehouse Databricks Delta Hive Snowflake

Unity Catalog support for Apache Iceberg™ brings open, interoperable table formats to the heart of the Databricks Lakehouse. In this session, we’ll introduce new capabilities that allow you to write Iceberg tables from any REST-compatible engine, apply fine-grained governance across all data, and unify access to external Iceberg catalogs like AWS Glue, Hive Metastore, and Snowflake Horizon. Learn how Databricks is eliminating data silos, simplifying performance with Predictive Optimization, and advancing a truly open lakehouse architecture with Delta and Iceberg side by side.

Iceberg Geo Type: Transforming Geospatial Data Management at Scale

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Szehon Ho (Databricks) , Jia Yu (Wherobots Inc.)

Analytics Data Management DWH Spark

The Apache Iceberg™ community is introducing native geospatial type support, addressing key challenges in managing geospatial data at scale, including fragmented formats and inefficiencies in storing large spatial datasets. This talk will delve into the origins of the Iceberg geo type, its specification design and future goals. We will examine the impact on both the geospatial and Iceberg communities, in introducing a standard data warehouse storage layer to the geospatial community, and enabling optimized geospatial analytics for Iceberg users. We will also present a live demonstration of the Iceberg geo data type with Apache Sedona™ and Apache Spark™, showcasing how it simplifies and accelerates geospatial analytics workflows and queries. Finally, we will also provide an in-depth look at its current capabilities and outline the roadmap for future developments, and offer a perspective on its role in advancing geospatial data management in the industry.

Rust and Lakehouse Format — Ask Us Anything

2025-06-12 · Data + AI Summit 2025

lightning_talk

by Robert Pack (Databricks) , Denny Lee (Databricks) , Tyler Croy (Scribd, Inc.)

Data Lakehouse Delta Rust

Join us for an in-depth Ask Me Anything (AMA) on how Rust is revolutionizing Lakehouse formats like Delta Lake and Apache Iceberg through projects like delta-rs and iceberg-rs! Discover how Rust’s memory safety, zero-cost abstractions and fearless concurrency unlock faster development and higher-performance data operations. Whether you’re a data engineer, Rustacean or Lakehouse enthusiast, bring your questions on how Rust is shaping the future of open table formats!

Sponsored by: Redpanda | IoT for Fun & Prophet: Scaling IoT and predicting the future with Redpanda, Iceberg & Prophet

The Future of Open Table Formats: Delta Lake, Iceberg, and More

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Daniel Weeks (Databricks) , Ryan Blue (Tabular)

Delta

Open table formats are evolving quickly. In this session, we’ll explore the latest features of Delta Lake and Apache Iceberg™ , including a look at the emerging Iceberg v3 specification. Join us to learn about what’s driving format innovation, how interoperability is becoming real, and what it means for the future of data architecture.

Incremental Iceberg Table Replication at Scale

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Hongyue Hongyue (Self-Employed) , Szehon Ho (Databricks)

Spark

Apache Iceberg is a popular table format for managing large analytical datasets. But replicating iceberg tables at scale can be a daunting task — especially when dealing with its hierarchical metadata. In this talk, we present an end-to-end workflow for replicating Apache Iceberg tables, leveraging Apache Spark to ensure that backup tables remain identical to their source counterparts. More excitingly, we have contributed these libraries back to the open-source community. Attendees will gain a comprehensive understanding of how to set up replication workflows for Iceberg tables, as well as practical guidance on how to manage and maintain replicated datasets at scale. This talk is ideal for data engineers, platform architects and practitioners looking to apply replication and disaster recovery for Apache Iceberg in complex data ecosystems.

Sponsored by: Google Cloud | Powering AI & Analytics: Innovations in Google Cloud Storage for Data Lakes

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Tathagata Das (Databricks) , Michelle Leon (Databricks)

Flink API Data Lakehouse DuckDB Cyber Security Spark Trino

The lakehouse is built for storage flexibility, but what about compute? In this session, we’ll explore how Unity Catalog enables you to connect and govern multiple compute engines across your data ecosystem. With open APIs and support for the Iceberg REST Catalog, UC lets you extend access to engines like Trino, DuckDB, and Flink while maintaining centralized security, lineage, and interoperability. We will show how you can get started today working with engines like Apache Spark and Starburst to read and write to UC managed tables with some exciting demos. Learn how to bring flexibility to your compute layer—without compromising control.

Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse Platform

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Ruotian Wang (Doordash) , Sergey Zavgorodni (DoorDash)

Amazon EMR Data Lake Data Lakehouse Databricks DWH Snowflake Trino

DoorDash Data organization actively adopts LakeHouse paradigm. This presentation describes the methodology which allows to migrate the classic Data Warehouse and Data Lake platforms to unified LakeHouse solution.The objective of this effort include Elimination of excessive data movement.Seamless integration and consolidation of the query engine layers, including Snowflake, Databricks, EMR and Trino.Query performance optimization.Abstracting away complexity of underlying storage layers and table formatsStrategic and justified decision on the Unified Metadata catalog used across varios compute platforms

No More Fragile Pipelines: Kafka and Iceberg the Declarative Way

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Adi Polak (Treeverse)

Analytics Kafka Parquet Data Streaming

Moving data between operational systems and analytics platforms is often painful. Traditional pipelines become complex, brittle, and expensive to maintain.Take Kafka and Iceberg: batching on Kafka causes ingestion bottlenecks, while streaming-style writes to Iceberg create too many small Parquet files—cluttering metadata, degrading queries, and increasing maintenance overhead. Frequent updates further strain background table operations, causing retries—even before dealing with schema evolution. But much of this complexity is avoidable. What if Kafka Topics and Iceberg Tables were treated as two sides of the same coin? By establishing a transparent equivalence, we can rethink pipeline design entirely. This session introduces Tableflow—a new approach to bridging streaming and table-based systems. It shifts complexity away from pipelines and into a unified layer, enabling simpler, declarative workflows. We’ll cover schema evolution, compaction, topic-to-table mapping, and how to continuously materialize and optimize thousands of topics as Iceberg tables. Whether modernizing or starting fresh, you’ll leave with practical insights for building resilient, scalable, and future-proof data architectures.

Master Schema Translations in the Era of Open Data Lake

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Eric Sun (Coinbase)

Data Lake Databricks Delta DynamoDB ETL/ELT Kafka MongoDB Snowflake postgresql

Unity Catalog puts variety of schemas into a centralized repository, now the developer community wants more productivity and automation for schema inference, translation, evolution and optimization especially for the scenarios of ingestion and reverse-ETL with more code generations.Coinbase Data Platform attempts to pave a path with "Schemaster" to interact with data catalog with the (proposed) metadata model to make schema translation and evolution more manageable across some of the popular systems, such as Delta, Iceberg, Snowflake, Kafka, MongoDB, DynamoDB, Postgres...This Lighting Talk covers 4 areas: The complexity and caveats of schema differences among The proposed field-level metadata model, and 2 translation patterns: point-to-point vs hub-and-spoke Why Data Profiling be augmented to enhance schema understanding and translation Integrate it with Ingestion & Reverse-ETL in a Databricks-oriented eco system Takeaway: standardize schema lineage & translation

Learning from Goldman Sachs' Legend Lakehouse for Data Governance

2025-06-11 · Data + AI Summit 2025

talk

by George Wu (Goldman Sachs) , Abhishek Narang (Goldman Sachs)

Analytics Data Contracts Data Governance Data Lakehouse Databricks OTF

Data is the backbone of modern decision-making, but centralizing it is only the tip of the iceberg. Entitlements, secure sharing and just-in-time availability are critical challenges to any large-scale platform. Join Goldman Sachs as we reveal how our Legend Lakehouse, coupled with Databricks, overcomes these hurdles to deliver high-quality, governed data at scale. By leveraging an open table format (Apache Iceberg) and open catalog format (Unity Catalog), we ensure platform interoperability and vendor neutrality. Databricks Unity Catalog then provides a robust entitlement system that aligns with our data contracts, ensuring consistent access control across producer and consumer workspaces. Finally, Legend functions, integrating with Databricks User Defined Functions (UDF), offer real-time data enrichment and secure transformations without exposing raw datasets. Discover how these components unite to streamline analytics, bolster governance and power innovation.

Cross-Cloud Data Mesh with Delta Sharing and UniForm in Mercedes-Benz

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Aleksandar Dragojevic (Databricks) , Alexander Summa (Mercedes-Benz Group AG)

AWS AWS Glue Azure Azure DevOps Cloud Computing Delta DevOps

In this presentation, we'll show how we achieved a unified development experience for teams working on Mercedes-Benz Data Platforms in AWS and Azure. We will demonstrate how we implemented Azure to AWS and AWS to Azure data product sharing (using Delta Sharing and Cloud Tokens), integration with AWS Glue Iceberg tables through UniForm and automation to drive everything using Azure DevOps Pipelines and DABs. We will also show how to monitor and track cloud egress costs and how we present a consolidated view of all the data products and relevant cost information. The end goal is to show how customers can offer the same user experience to their engineers and not have to worry about which cloud or region the Data Product lives in. Instead, they can enroll in the data product through self-service and have it available to them in minutes, regardless of where it originates.

Multi-Format, Multi-Table, Multi-Statement Transactions on Unity Catalog

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Prakhar Jain (Databricks) , Michelle Leon (Databricks)

Data Lakehouse Databricks Delta

Get a first look at multi-statement transactions in Databricks. In this session, we will dive into their capabilities, exploring how multi-statement transactions enable atomic updates across multiple tables in your data pipelines, ensuring data consistency and integrity for complex operations. We will also share how we are enabling unified transactions across Delta Lake and Iceberg with Unity Catalog — powering our vision for an open and interoperable lakehouse.

Sponsored by: Firebolt | 10ms Queries on Iceberg: Turbocharging Your Lakehouse for Interactive Experiences with Firebolt

Unlocking the Power of Iceberg: Our Journey to a Unified Lakehouse on Databricks

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Tomer Sabag (LSports)

Data Engineering Data Lakehouse Databricks

This session showcases our journey of adopting Apache Iceberg™ to build a modern lakehouse architecture and leveraging Databricks advanced Iceberg support to take it to the next level. We’ll dive into the key design principles behind our lakehouse, the operational challenges we tackled and how Databricks enabled us to unlock enhanced performance, scalability and streamlined data workflows. Whether you’re exploring Apache Iceberg™ or building a lakehouse on Databricks, this session offers actionable insights, lessons learned and best practices for modern data engineering.

Introduction to Modern Open Table Formats and Catalogs

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Bart Samwel (Databricks) , Sirui Sun (Databricks)

AI/ML Delta

In this session, learn about why modern open table formats like Delta and Iceberg are a big deal and how they work with catalogs. Learn about what motivated their creation, how they work, what benefits they can bring to your data and AI platform. Hear about how these formats are becoming increasingly interoperable and what our vision is for their future.

A Unified Solution for Data Management and Model Training With Apache Iceberg and Mosaic Streaming

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Zilong Zhou (ByteDance)

Data Management Data Streaming

This session introduces ByteDance’s challenges in data management and model training, and addresses them by Magnus (enhanced Apache Iceberg) and Byted Streaming (customized Mosaic Streaming). Magnus uses Iceberg’s branch/tag to manage massive datasets/checkpoints efficiently. With enhanced metadata and a custom C++ data reader, Magnus achieves optimal sharding, shuffling and data loading. Flexible table migration, detailed metrics and built-in full-text indexes on Iceberg tables further ensure training reliability. When training with ultra-large datasets, ByteDance faced scalability and performance issues. Given Streaming's scalability in distributed training and good code structure, the team chose and customized it to resolve challenges like slow startup, high resource consumption, and limited data source compatibility. In this session, we will explore Magnus and Byted Streaming, discuss their enhancements and demonstrate how they enable efficient and robust distributed training.

Apache Iceberg with Unity Catalog at HelloFresh

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Max Schultze (HelloFresh) , Adam Komisarek (HelloFresh)

Flink Data Lakehouse Databricks Delta Snowflake Spark

Table formats like Delta Lake and Iceberg have been game changers for pushing lakehouse architecture into modern Enterprises. The acquisition of Tabular added Iceberg to the Databricks ecosystem, an open format that was already well supported by processing engines across the industry. At HelloFresh we are building a lakehouse architecture that integrates many touchpoints and technologies all across the organization. As such we chose Iceberg as the table format to bridge the gaps in our decentralized managed tech landscape. We are leveraging Unity Catalog as the Iceberg REST catalog of choice for storing metadata and managing tables. In this talk we will outline our architectural setup between Databricks, Spark, Flink and Snowflake and will explain the native Unity Iceberg REST catalog, as well as catalog federation towards connected engines. We will highlight the impact on our business and discuss the advantages and lessons learned from our early adopter experience.

Breaking Silos: Enabling Databricks-Snowflake Interoperability With Iceberg and Unity Catalog

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Mohit Kumar (T-Mobile) , Geoffrey Freeman (T-Mobile)

API Databricks Delta Cyber Security Snowflake

As data ecosystems grow more complex, organizations often struggle with siloed platforms and fragmented governance. In this session, we’ll explore how our team made Databricks the central hub for cross-platform interoperability, enabling seamless Snowflake integration through Unity Catalog and the Iceberg REST API. We’ll cover: Why interoperability matters and the business drivers behind our approach How Unity Catalog and Uniform simplify interoperability, allowing Databricks to expose an Iceberg REST API for external consumption Technical deep dive into data sharing, query performance, and access control across Databricks and Snowflake Lessons learned and best practices for building a multi-engine architecture while maintaining governance and efficiency By leveraging Uniform, Delta, and Iceberg, we created a flexible, vendor-agnostic architecture that bridges Databricks and Snowflake without compromising performance or security.

talk-data.com

Activity Trend

Top Events

Top Speakers

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

Iceberg Geo Type: Transforming Geospatial Data Management at Scale

Rust and Lakehouse Format — Ask Us Anything

Sponsored by: Redpanda | IoT for Fun & Prophet: Scaling IoT and predicting the future with Redpanda, Iceberg & Prophet

The Future of Open Table Formats: Delta Lake, Iceberg, and More

Incremental Iceberg Table Replication at Scale

Sponsored by: Google Cloud | Powering AI & Analytics: Innovations in Google Cloud Storage for Data Lakes

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse Platform

No More Fragile Pipelines: Kafka and Iceberg the Declarative Way

Master Schema Translations in the Era of Open Data Lake

Learning from Goldman Sachs' Legend Lakehouse for Data Governance

Cross-Cloud Data Mesh with Delta Sharing and UniForm in Mercedes-Benz

Multi-Format, Multi-Table, Multi-Statement Transactions on Unity Catalog

Sponsored by: Firebolt | 10ms Queries on Iceberg: Turbocharging Your Lakehouse for Interactive Experiences with Firebolt

Unlocking the Power of Iceberg: Our Journey to a Unified Lakehouse on Databricks

Introduction to Modern Open Table Formats and Catalogs

A Unified Solution for Data Management and Model Training With Apache Iceberg and Mosaic Streaming

Apache Iceberg with Unity Catalog at HelloFresh

Breaking Silos: Enabling Databricks-Snowflake Interoperability With Iceberg and Unity Catalog