talk-data.com

Topic

Iceberg

Apache Iceberg

table_format data_lake schema_evolution file_format storage open_table_format

Activities

tagged

Activity Trend

39 peak/qtr

2020-Q1 2026-Q2

Top Events

Data Engineering Podcast 65 Data + AI Summit 2025 23 Big Data LDN 2025 13 dbt Coalesce 2025 9 O'Reilly Data Engineering Books 9 Databricks DATA + AI Summit 2023 6 Big Data & AI Paris 2025 5 AWS re:Invent 2024 5 Snowflake World Tour Berlin 5 Google Cloud Next '25 4 The Analytics Engineering Podcast 4 Big Data LDN 2024 4

Top Speakers

Tobias Macey 65 Yingjun Wu (RisingWave Labs) 5 Tom Scott (Streambased) 5 Tristan Handy (dbt Labs) 4 Ryan Blue (Tabular) 4 Adi Polak (Treeverse) 3 Dipti Borkar (Microsoft) 3 alex merced (Dremio) 3 Holly Smith (Databricks) 3 Julien Le Dem (Astronomer) 3 Jean-Baptiste Onofre (Apache Software Foundation) 2 Melvyn Peignon (ClickHouse) 2

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Databricks DATA + AI Summit 2023 ×

The Evolution of Delta Lake from Data + AI Summit 2024

2024-06-17 · Databricks DATA + AI Summit 2023 Watch

video

by Shant Hovsepian (Databricks)

AI/ML API Data Lakehouse Databricks Delta DuckDB DWH Hudi

Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks explains why Delta Lake is the most adopted open lakehouse format.

Includes: - Delta Lake UniForm GA (support for and compatibility with Hudi, Apache Iceberg, Delta) - Delta Lake Liquid Clustering - Delta Lake production-ready catalog (Iceberg REST API) - The growth and strength of the Delta ecosystem - Delta Kernel - DuckDB integration with Delta - Delta 4.0

The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data + AI Summit

2024-06-16 · Databricks DATA + AI Summit 2023 Watch

video

by Ryan Blue (Tabular) , Ali Ghodsi (Databricks)

AI/ML Data Lakehouse Databricks

Speakers: Ali Ghodsi, Co-founder and CEO, Databricks Ryan Blue, Creator of Apache Iceberg and co-founder of Tabular

US Army Corp of Engineers Enhanced Commerce & National Sec Through Data-Driven Geospatial Insight

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Jeff Mroz

AI/ML Analytics API Cloud Computing Data Engineering Data Governance Data Lake Data Lakehouse Data Management Data Quality Databricks Delta +12 more

The US Army Corps of Engineers (USACE) is responsible for maintaining and improving nearly 12,000 miles of shallow-draft (9'-14') inland and intracoastal waterways, 13,000 miles of deep-draft (14' and greater) coastal channels, and 400 ports, harbors, and turning basins throughout the United States. Because these components of the national waterway network are considered assets to both US commerce and national security, they must be carefully managed to keep marine traffic operating safely and efficiently.

The National DQM Program is tasked with providing USACE a nationally standardized remote monitoring and documentation system across multiple vessel types with timely data access, reporting, dredge certifications, data quality control, and data management. Government systems have often lagged commercial systems in modernization efforts, and the emergence of the cloud and Data Lakehouse Architectures have empowered USACE to successfully move into the modern data era.

This session incorporates aspects of these topics: Data Lakehouse Architecture: Delta Lake, platform security and privacy, serverless, administration, data warehouse, Data Lake, Apache Iceberg, Data Mesh GIS: H3, MOSAIC, spatial analysis data engineering: data pipelines, orchestration, CDC, medallion architecture, Databricks Workflows, data munging, ETL/ELT, lakehouses, data lakes, Parquet, Data Mesh, Apache Spark™ internals. Data Streaming: Apache Spark Structured Streaming, real-time ingestion, real-time ETL, real-time ML, real-time analytics, and real-time applications, Delta Live Tables. ML: PyTorch, TensorFlow, Keras, scikit-learn, Python and R ecosystems data governance: security, compliance, RMF, NIST data sharing: sharing and collaboration, delta sharing, data cleanliness, APIs.

Talk by: Jeff Mroz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Introducing Universal Format: Iceberg and Hudi Support in Delta Lake

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Himanshu Raja (Databricks) , Ryan Johnson (Databricks)

Data Lakehouse Databricks Delta Hudi

In this session, we will talk about how Delta Lake plans to integrate with Iceberg and Hudi. Customers are being forced to choose storage formats based on the tools that support them rather than choosing the most performant and functional format for their lakehouse architecture. With Universal Format (“UniForm”), Delta removes the need to make this compromise and makes Delta tables compatible with Iceberg and Hudi query engines. We will do a technical deep dive of the technology, demo it, and discuss the roadmap.

Talk by: Himanshu Raja and Ryan Johnson

Backfill Streaming Data Pipelines in Kappa Architecture

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Flink AWS Lambda Databricks DWH Kafka Data Streaming

Streaming data pipelines can fail due to various reasons. Since the source data, such as Kafka topics, often have limited retention, prolonged job failures can lead to data loss. Thus, streaming jobs need to be backfillable at all times to prevent data loss in case of failures. One solution is to increase the source's retention so that backfilling is simply replaying source streams, but extending Kafka retention is very costly for Netflix's data sizes. Another solution is to utilize source data stored in DWH, commonly known as the Lambda architecture. However, this method introduces significant code duplication, as it requires engineers to maintain a separate equivalent batch job. At Netflix, we have created the Iceberg Source Connector to provide backfilling capabilities to Flink streaming applications. It allows Flink to stream data stored in Apache Iceberg while mirroring Kafka's ordering semantics, enabling us to backfill large-scale stateful Flink pipelines at low retention cost.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How Adobe migrated to a unified and open data Lakehouse to deliver personalization at scale.

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

by David Weinstein (Adobe Experience Cloud)

Cloud Computing Data Lake Data Lakehouse Databricks Delta

In this keynote talk, David Weinstein, VP of Engineering for Adobe Experience Cloud, will share Adobe’s journey from a simple data lake to a unified, open Lakehouse architecture with Databricks. Adobe can now deliver personalized experiences at scale to diverse customers with greater speed, operational efficiency and faster innovation across the Experience Cloud portfolio. Learn why they chose to migrate from Iceberg to Delta Lake to drive its open standard development and accelerate innovation of their Lakehouse, and they’ll also share how leveraging the Delta Lake table format has allowed for techniques to support change data capture and significantly improve operational efficiency.