talk-data.com

Topic

apache iceberg

Activities

tagged

Activity Trend

1 peak/qtr

2020-Q1 2026-Q1

Top Events

Open Source Data Deep Dive - Santa Clara, CA - 9/18/24 2 IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup 2 ClickHouse Meetup Zurich 1 Come in, it's cold outside 1 REPLAY - Construire la plateforme DATA idéale - meetup OVHcloud du 3 Avril 1 Apache Iceberg Paris Community Meetup #2 1 Autumn Leaves, Data Stays 1 Git for Data: How Table Formats Unify Software and Data Development 1 Apache Iceberg x Apache Kafka x Grafana 1 IN-PERSON! Apache Kafka® Meetup Septembre 1 IN PERSON: Tooling for running Apache Kafka in Production 1 Git for Data: How Table Formats Unify Software and Data Development 1

Top Speakers

JB Onofré (Dremio) 2 Victor Coustenoble (Starbust) 2 alex merced (Dremio) 1 Will Martin (Dremio) 1 Michal Gancarski (GROPYUS) 1 Olena Kutsenko (Confluent) 1 Yingjun Wu (RisingWave Labs) 1 Josh Lee (Altinity) 1 Brad Miro (Google) 1 Maciej Bak (Altinity) 1 weimo liu (Puppygraph) 1 Viktor Gamov (Confluent) 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup ×

One Does Not Simply Query a Stream

2025-04-14 · IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup

talk

by Viktor Gamov (Confluent)

ClickHouse Kafka SQL Trino flink kafka streams postgresql tableflow

Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.\n\nIn this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

Melting Icebergs: Enabling Analytical Access to Kafka Data through Iceberg Projections

2025-04-14 · IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup

talk

Kafka

An organisation's data has traditionally been split between the operational estate, for daily business operations, and the analytical estate for after-the-fact analysis and reporting. The journey from one side to the other is today a long and torturous one. But does it have to be?\n\nIn the modern data stack Apache Kafka is your defacto standard operational platform and Apache Iceberg has emerged as the champion of table formats to power analytical applications. Can we leverage the best of Iceberg and Kafka to create a powerful solution greater than the sum of its parts?\n\nYes you can and we did!\n\nThis isn't a typical story of connectors, ELT, and separate data stores. We've developed an advanced projection of Kafka data in an Iceberg-compatible format, allowing direct access from warehouses and analytical tools.\n\nIn this talk, we'll cover:\n* How we presented Kafka data for Iceberg processors without moving or transforming data upfront—no hidden ETL!\n* Integrating Kafka's ecosystem into Iceberg, leveraging Schema Registry, consumer groups, and more.\n* Meeting Iceberg's performance and cost reduction expectations while sourcing data directly from Kafka.\n\nExpect a technical deep dive into the protocols, formats, and services we used, all while staying true to our core principles:\n* Kafka as the single source of truth—no separate stores.\n* Analytical processors shouldn't need Kafka-specific adjustments.\n* Operational performance must remain uncompromised.\n* Kafka's mature ecosystem features, like ACLs and quotas, should be reused, not reinvented.\n\nJoin us for a thrilling account of the highs and lows of merging two data giants and stay tuned for the surprise twist at the end!