Topic

Hudi

table_format data_lake open_table_format

Activities

2

tagged

Activity Trend

14 peak/qtr

2020-Q1 2026-Q2

Top Events

Data Engineering Podcast 35 O'Reilly Data Engineering Books 2 Databricks DATA + AI Summit 2023 2 The Joe Reis Show 1 dbt Coalesce 2022 1 Data Universe 2024 1 Big Data LDN 2024 1 DataTalks.Club 1 Big Data LDN 2025 1 Data Council Austin 2024 - Day 1 1 Databricks London Meetup @ Big Data LDN 2025 1 Meetup HumanTalks Paris @leboncoin 1

Top Speakers

Tobias Macey 35 Holly Smith (Databricks) 2 Dipti Borkar (Microsoft) 2 Vinoth Chandar (Uber) 2 Elad Eldor 1 Lukas Fittl 1 Colleen Tartow (Starburst Data) 1 Andrey Korchak (Monite) 1 Himanshu Raja (Databricks) 1 Paul Dix (InfluxData) 1 Gavi Regunath (Advancing Analytics) 1 Lalith Suresh 1

Activities

2 activities · Newest first

All Video Podcast Book

Engineering Lakehouses with Open Table Formats

2025-12-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dipankar Mazumdar , Vinoth Govindarajan (Apple)

Airflow Flink Big Data Data Lakehouse Data Management dbt Delta Iceberg Python Spark data data-engineering +2 more

Engineering Lakehouses with Open Table Formats introduces the architecture and capabilities of open table formats like Apache Iceberg, Apache Hudi, and Delta Lake. The book guides you through the design, implementation, and optimization of lakehouses that can handle modern data processing requirements effectively with real-world practical insights. What this Book will help me do Understand the fundamentals of open table formats and their benefits in lakehouse architecture. Learn how to implement performant data processing using tools like Apache Spark and Flink. Master advanced topics like indexing, partitioning, and interoperability between data formats. Explore data lifecycle management and integration with frameworks like Apache Airflow and dbt. Build secure lakehouses with regulatory compliance using best practices detailed in the book. Author(s) Dipankar Mazumdar and Vinoth Govindarajan are seasoned professionals with extensive experience in big data processing and software architecture. They bring their expertise from working with data lakehouses and are known for their ability to explain complex technical concepts clearly. Their collaborative approach brings valuable insights into the latest trends in data management. Who is it for? This book is ideal for data engineers, architects, and software professionals aiming to master modern lakehouse architectures. If you are familiar with data lakes or warehouses and wish to transition to an open data architectural design, this book is suited for you. Readers should have basic knowledge of databases, Python, and Apache Spark for the best experience.

Apache Hudi: The Definitive Guide

2025-10-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rebecca Bilbro , Prashant Wason , Bhavani Sudha Saktheeswaran , Shiyan Xu

Analytics Data Lakehouse Hadoop Data Streaming apache-hive data data-engineering

Overcome challenges in building transactional guarantees on rapidly changing data by using Apache Hudi. With this practical guide, data engineers, data architects, and software architects will discover how to seamlessly build an interoperable lakehouse from disparate data sources and deliver faster insights using your query engine of choice. Authors Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, and Rebecca Bilbro provide practical examples and insights to help you unlock the full potential of data lakehouses for different levels of analytics, from batch to interactive to streaming. You'll also learn how to evaluate storage choices and leverage built-in automated table optimizations to build, maintain, and operate production data applications. Understand the need for transactional data lakehouses and the challenges associated with building them Explore data ecosystem support provided by Apache Hudi for popular data sources and query engines Perform different write and read operations on Apache Hudi tables and effectively use them for various use cases, including batch and stream applications Apply different storage techniques and considerations such as indexing and clustering to maximize your lakehouse performance Build end-to-end incremental data pipelines using Apache Hudi for faster ingestion and fresher analytics