talk-data.com talk-data.com

Adi Polak

Speaker

Adi Polak

1

talks

VP of Developer Experience Treeverse

Adi Polak is an experienced software engineer and people manager focused on data, AI, and machine learning for operations and analytics. She has built algorithms and distributed data pipelines using Spark, Kafka, HDFS, and large-scale systems, and has led teams to deliver pioneering ML initiatives. An accomplished educator, she has taught thousands of students how to scale machine learning with Spark and is the author of Scaling Machine Learning with Spark and High Performance Spark (2nd Edition). Earlier this year, she began exploring data streaming with Flink and ML inference, focusing on high-performance, end-to-end systems.

Bio from: Databricks DATA + AI Summit 2023

Filtering by: Big Data LDN 2025 ×

Filter by Event / Source

Talks & appearances

Showing 1 of 10 activities

Search activities →

Moving data between operational systems and analytics platforms is often a painful process. Traditional pipelines that transfer data in and out of warehouses tend to become complex, brittle, and expensive to maintain over time.

Much of this complexity, however, is avoidable. Data in motion and data at rest—Kafka Topics and Iceberg Tables—can be treated as two sides of the same coin. By establishing an equivalence between Topics and Tables, it’s possible to transparently map between them and rethink how pipelines are built.

This talk introduces a declarative approach to bridging streaming and table-based systems. By shifting complexity into the data layer, we can decompose complex, imperative pipelines into simpler, more reliable workflows

We’ll explore the design principles behind this approach, including schema mapping and evolution between Kafka and Iceberg, and how to build a system that can continuously materialize and optimize hundreds of thousands of topics as Iceberg tables.

Whether you're building new pipelines or modernizing legacy systems, this session will provide practical patterns and strategies for creating resilient, scalable, and future-proof data architectures.