talk-data.com

Topic

Iceberg

Apache Iceberg

table_format data_lake schema_evolution file_format storage open_table_format

Activities

tagged

Activity Trend

39 peak/qtr

2020-Q1 2026-Q1

Top Events

Data Engineering Podcast 65 Data + AI Summit 2025 23 Big Data LDN 2025 13 dbt Coalesce 2025 9 O'Reilly Data Engineering Books 9 Databricks DATA + AI Summit 2023 6 Big Data & AI Paris 2025 5 AWS re:Invent 2024 5 Snowflake World Tour Berlin 5 Google Cloud Next '25 4 The Analytics Engineering Podcast 4 Big Data LDN 2024 4

Top Speakers

Tobias Macey 65 Yingjun Wu (RisingWave Labs) 5 Tom Scott (Streambased) 5 Tristan Handy (dbt Labs) 4 Ryan Blue (Tabular) 4 Adi Polak (Treeverse) 3 Dipti Borkar (Microsoft) 3 alex merced (Dremio) 3 Holly Smith (Databricks) 3 Julien Le Dem (Astronomer) 3 Jean-Baptiste Onofre (Apache Software Foundation) 2 Melvyn Peignon (ClickHouse) 2

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Adi Polak ×

Projection Pushdown vs Predicate "Pushdown": Rethinking Query Efficiency

2025-11-05 · Small Data SF 2025

talk

by Adi Polak (Treeverse)

Flink Arrow Big Data DuckDB Protobuf

We were told to scale compute. But what if the real problem was never about big data, but about bad data access? In this talk, we’ll unpack two powerful, often misunderstood techniques—projection pushdown and predicate pushdown—and why they matter more than ever in a world where we want lightweight, fast queries over large datasets. These optimizations aren’t just academic—they’re the difference between querying a terabyte in seconds vs. minutes. We’ll show how systems like Flink and DuckDB leverage these techniques, what limits them (hello, Protobuf), and how smart schema and storage design—especially in formats like Iceberg and Arrow can unlock dramatic speed gains. Along the way, we’ll highlight the importance of landing data in queryable formats, and why indexing and query engines matter just as much as compute. This talk is for anyone who wants to stop fully scanning their data lakes just to read one field.

No More Fragile Pipelines: Kafka and Iceberg the Declarative Way

2025-09-24 · Big Data LDN 2025

Face To Face

by Adi Polak (Treeverse)

Analytics Kafka Data Streaming

Moving data between operational systems and analytics platforms is often a painful process. Traditional pipelines that transfer data in and out of warehouses tend to become complex, brittle, and expensive to maintain over time.

Much of this complexity, however, is avoidable. Data in motion and data at rest—Kafka Topics and Iceberg Tables—can be treated as two sides of the same coin. By establishing an equivalence between Topics and Tables, it’s possible to transparently map between them and rethink how pipelines are built.

This talk introduces a declarative approach to bridging streaming and table-based systems. By shifting complexity into the data layer, we can decompose complex, imperative pipelines into simpler, more reliable workflows

We’ll explore the design principles behind this approach, including schema mapping and evolution between Kafka and Iceberg, and how to build a system that can continuously materialize and optimize hundreds of thousands of topics as Iceberg tables.

Whether you're building new pipelines or modernizing legacy systems, this session will provide practical patterns and strategies for creating resilient, scalable, and future-proof data architectures.

No More Fragile Pipelines: Kafka and Iceberg the Declarative Way

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Adi Polak (Treeverse)

Analytics Kafka Parquet Data Streaming

Moving data between operational systems and analytics platforms is often painful. Traditional pipelines become complex, brittle, and expensive to maintain.Take Kafka and Iceberg: batching on Kafka causes ingestion bottlenecks, while streaming-style writes to Iceberg create too many small Parquet files—cluttering metadata, degrading queries, and increasing maintenance overhead. Frequent updates further strain background table operations, causing retries—even before dealing with schema evolution. But much of this complexity is avoidable. What if Kafka Topics and Iceberg Tables were treated as two sides of the same coin? By establishing a transparent equivalence, we can rethink pipeline design entirely. This session introduces Tableflow—a new approach to bridging streaming and table-based systems. It shifts complexity away from pipelines and into a unified layer, enabling simpler, declarative workflows. We’ll cover schema evolution, compaction, topic-to-table mapping, and how to continuously materialize and optimize thousands of topics as Iceberg tables. Whether modernizing or starting fresh, you’ll leave with practical insights for building resilient, scalable, and future-proof data architectures.