Speaker

Adi Polak

Activities

1

talks

VP of Developer Experience Treeverse

Adi Polak is an experienced software engineer and people manager focused on data, AI, and machine learning for operations and analytics. She has built algorithms and distributed data pipelines using Spark, Kafka, HDFS, and large-scale systems, and has led teams to deliver pioneering ML initiatives. An accomplished educator, she has taught thousands of students how to scale machine learning with Spark and is the author of Scaling Machine Learning with Spark and High Performance Spark (2nd Edition). Earlier this year, she began exploring data streaming with Flink and ML inference, focusing on high-performance, end-to-end systems.

Bio from: Databricks DATA + AI Summit 2023

Filtering by: Data + AI Summit 2025 ×

Filter by Event / Source

O'Reilly Data Engineering Books 2 Databricks DATA + AI Summit 2023 2 IN PERSON: Apache Kafka x Apache Flink 1 IN-PERSON: Apache Kafka® x Apache Flink® Meetup 1 Big Data LDN 2025 1 Big Data LDN 2025 1 Data + AI Summit 2025 1 Small Data SF 2025 1

Talks & appearances

Showing 1 of 10 activities

Search activities →

No More Fragile Pipelines: Kafka and Iceberg the Declarative Way

2025-06-11 · Data + AI Summit 2025 Watch

talk

Analytics Iceberg Kafka Parquet Data Streaming

Moving data between operational systems and analytics platforms is often painful. Traditional pipelines become complex, brittle, and expensive to maintain.Take Kafka and Iceberg: batching on Kafka causes ingestion bottlenecks, while streaming-style writes to Iceberg create too many small Parquet files—cluttering metadata, degrading queries, and increasing maintenance overhead. Frequent updates further strain background table operations, causing retries—even before dealing with schema evolution. But much of this complexity is avoidable. What if Kafka Topics and Iceberg Tables were treated as two sides of the same coin? By establishing a transparent equivalence, we can rethink pipeline design entirely. This session introduces Tableflow—a new approach to bridging streaming and table-based systems. It shifts complexity away from pipelines and into a unified layer, enabling simpler, declarative workflows. We’ll cover schema evolution, compaction, topic-to-table mapping, and how to continuously materialize and optimize thousands of topics as Iceberg tables. Whether modernizing or starting fresh, you’ll leave with practical insights for building resilient, scalable, and future-proof data architectures.