talk-data.com talk-data.com

Topic

Hudi

table_format data_lake open_table_format

4

tagged

Activity Trend

14 peak/qtr
2020-Q1 2026-Q1

Activities

4 activities · Newest first

The Evolution of Delta Lake from Data + AI Summit 2024

Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks explains why Delta Lake is the most adopted open lakehouse format.

Includes: - Delta Lake UniForm GA (support for and compatibility with Hudi, Apache Iceberg, Delta) - Delta Lake Liquid Clustering - Delta Lake production-ready catalog (Iceberg REST API) - The growth and strength of the Delta ecosystem - Delta Kernel - DuckDB integration with Delta - Delta 4.0

Introducing Universal Format: Iceberg and Hudi Support in Delta Lake

In this session, we will talk about how Delta Lake plans to integrate with Iceberg and Hudi. Customers are being forced to choose storage formats based on the tools that support them rather than choosing the most performant and functional format for their lakehouse architecture. With Universal Format (“UniForm”), Delta removes the need to make this compromise and makes Delta tables compatible with Iceberg and Hudi query engines. We will do a technical deep dive of the technology, demo it, and discuss the roadmap.

Talk by: Himanshu Raja and Ryan Johnson

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Petabyte-scale lakehouses with dbt and Apache Hudi

While the data lakehouse architecture offers many inherent benefits, it’s still relatively new to the dbt community, which creates hurdles to adoption.

In this talk, you’ll meet Apache Hudi, a platform used by organizations to build planet-scale data platforms according to all of the key design elements required by the lakehouse architecture. You’ll also learn how we’ve personaly used Hudi, along with dbt, Spark, Airflow, and many more open-source tools to build a truly reliable big data streaming lakehouse that cut the latency of our petabyte-scale data pipelines from hours to minutes.

Check the slides here: https://docs.google.com/presentation/d/18dv4TZzRnZQ-IK7xLkYJuind4Bcztkl19zV7b4HTaTU/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.