talk-data.com

Topic

apache iceberg

Activities

tagged

Activity Trend

1 peak/qtr

2020-Q1 2026-Q1

Top Events

Open Source Data Deep Dive - Santa Clara, CA - 9/18/24 2 IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup 2 ClickHouse Meetup Zurich 1 Come in, it's cold outside 1 REPLAY - Construire la plateforme DATA idéale - meetup OVHcloud du 3 Avril 1 Apache Iceberg Paris Community Meetup #2 1 Autumn Leaves, Data Stays 1 Git for Data: How Table Formats Unify Software and Data Development 1 Apache Iceberg x Apache Kafka x Grafana 1 IN-PERSON! Apache Kafka® Meetup Septembre 1 IN PERSON: Tooling for running Apache Kafka in Production 1 Git for Data: How Table Formats Unify Software and Data Development 1

Top Speakers

JB Onofré (Dremio) 2 Victor Coustenoble (Starbust) 2 alex merced (Dremio) 1 Will Martin (Dremio) 1 Michal Gancarski (GROPYUS) 1 Olena Kutsenko (Confluent) 1 Yingjun Wu (RisingWave Labs) 1 Josh Lee (Altinity) 1 Brad Miro (Google) 1 Maciej Bak (Altinity) 1 weimo liu (Puppygraph) 1 Viktor Gamov (Confluent) 1

Activities

21 activities · Newest first

All Video Podcast Book

From Crypto Streams to AI-Powered Predictions

2025-12-01 · Crypto Streams to AI Predictions: Apache Kafka®, Apache Flink® & Apache Iceberg®

workshop

by Olena Kutsenko (Confluent)

DuckDB Kafka confluent cloud flink tableflow

In this 2-hour hands-on workshop, you'll build an end-to-end streaming analytics pipeline that captures live cryptocurrency prices, processes them in real-time, and uses AI to forecast the future. Ingest live crypto data into Apache Kafka using Kafka Connect; tame that chaos with Apache Flink's stream processing; freeze streams into queryable Apache Iceberg tables using Tableflow; and forecast price trends with Flink AI.

Demo with Grafana and Streambased Iceberg architectures

2025-11-27 · Apache Iceberg x Apache Kafka x Grafana

demo

Grafana

Demo with Grafana and Streambased Iceberg architectures (Tom from Streambased)

Stop Stitching Dashboards: How Kafka + Iceberg Deliver Seamless Time Travel

2025-11-27 · Grafana & Friends France : édition de novembre chez Elaia

talk

Grafana Kafka

Grafana's magic comes from seeing what's happening right now and instantly comparing it to everything that has happened before. Real-time data lets teams spot anomalies the moment they emerge. Long-term data reveals whether those anomalies are new, seasonal, or the same gremlins haunting your system every quarter. But actually building this capability? That's where everything gets messy. Today's dashboards are cobbled together from two very different worlds: Long-term data living in lakes and warehouses; Real-time streams blasting through Kafka or similar systems. These systems rarely fit together cleanly, which forces dashboard developers to wrestle with: Differing processing concepts - What does SQL even mean on a stream? Inconsistent governance - Tables vs. message schemas, different owners, different rules; Incomplete history - Not everything is kept forever, and you never know what will vanish next; Maintenance drift - As pipelines evolve, your ETL always falls behind. But what if there were no separation at all? Join us for a deep dive into a new, unified approach where real-time and historical data live together in a single, seamless dataset. Imagine dashboards powered by one source of truth that stretches from less than one second ago to five, ten, or even fifteen years into the past, without stitching, syncing, or duct-taping systems together. Using Apache Kafka, Apache Iceberg, and a modern architectural pattern that eliminates the old 'batch vs. stream' divide, we'll show how to: Build Grafana dashboards that just work with consistent semantics end-to-end; Keep every message forever without drowning in storage costs; Query real-time and historical data with the same language, same governance, same everything; Escape the ETL death spiral once and for all. If you've ever wished your dashboards were both lightning-fast and infinitely deep, this talk will show you how close that future really is.

From Stream to Table: Building Kafka-to-Iceberg Pipelines

2025-10-23 · IN PERSON: Tooling for running Apache Kafka in Production

talk

by Will Martin (Dremio)

Kafka

While Kafka excels at streaming data, the real challenge lies in making that data analytically useful without sacrificing consistency or performance. This talk explores why Apache Iceberg has emerged as the ideal streaming destination, offering ACID transactions, schema evolution, and time travel capabilities that traditional data lakes can't match. Learn about some foundational tools that enable streaming pipelines and why they all converge on this next-generation table format built for flexibility and scalability.

Git for Data

2025-10-15 · Git for Data: How Table Formats Unify Software and Data Development

talk

Git

Talk on distributed version control and how data projects can leverage Git and open formats like Apache Iceberg to enable multi-user data pipelines with snapshotting, time-travel, and branching.

Git for Data

2025-10-15 · Git for Data: How Table Formats Unify Software and Data Development

talk

Git

Distributed version control systems - such as Git - unlock software development in multi-player mode: devs can safely work over the same code base, with standard (albeit perhaps not user-friendly!) abstractions for snapshotting, time-travel, and branching. Data folks have rarely been so lucky, as their projects crucially depend on data, whose life-cycle management is often cumbersome and custom. In this talk, we present open formats - such as Apache Iceberg - to practitioners with limited exposure to modern cloud infrastructure. In particular, we show how moving from datasets to tables unlocks a similar multi-player mode when building data pipelines, with equivalent abstractions for snapshotting, time-travel, branching, and a unified backbone for pipelines, data science, and AI use cases.

Adapting ClickHouse® to Use Apache Iceberg Storage

2025-10-14 · Autumn Leaves, Data Stays

talk

by Josh Lee (Altinity) , Maciej Bak (Altinity)

ClickHouse

A talk about adapting ClickHouse to use Apache Iceberg storage.

Governance at Scale with Iceberg: Unlocking Metadata through Lakekeeper

2025-10-09 · ClickHouse Meetup Zurich

talk

lakekeeper metadata governance

Apache Iceberg brings powerful metadata to the table—but how do you turn it into governance that scales without slowing teams down? In this talk, we’ll explore how Lakekeeper builds on Iceberg’s foundation to make data management, fine-grained access control, and cross-platform interoperability seamless. Learn how metadata is becoming the backbone of modern data platforms, and why that matters for anyone using Iceberg together with engines like ClickHouse.

Discussion : Apache Iceberg, Apache Kafka, Apache Polaris (incubating) sont sur un bateau: peupler des tables Iceberg

2025-09-30 · IN-PERSON! Apache Kafka® Meetup Septembre

talk

by JB Onofré (Dremio)

Kafka apache polaris (incubating)

Discussion : Apache Iceberg, Apache Kafka, Apache Polaris (incubating) sont sur un bateau: peupler des tables Iceberg. Après une introduction rapide de Apache Iceberg, Apache Kafka, Apache Polaris (incubating), nous verrons les approches possibles pour injecter de la donnée dans des tables Iceberg avec Kafka et quels sont les problématiques à considérer: gestion des snapshots, compaction des data files, etc. Nous verrons des solutions possibles et comment Apache Polaris (incubating) peut aider à la maintenance des tables.

Discussion: Apache Iceberg, Apache Kafka, Apache Polaris (incubating) sur un bateau: peupler des tables Iceberg

2025-09-30 · IN-PERSON! Apache Kafka® Meetup Septembre

talk

by JB Onofré (Dremio)

Kafka apache polaris (incubating)

Après une introduction rapide de Apache Iceberg, Apache Kafka, Apache Polaris (incubating), nous verrons les approches possibles pour injecter des données dans des tables Iceberg avec Kafka et quels sont les problématiques à considérer: gestion des snapshots, compaction des data files, etc. Nous verrons des solutions possibles et comment Apache Polaris (incubating) peut aider à la maintenance des tables.

Nouveautés et tout ce que vous ne savez pas sur le connecteur Trino pour Apache Iceberg

2025-09-17 · Apache Iceberg Paris Community Meetup #2

talk

by Victor Coustenoble (Starbust)

Trino

Plongez au cœur du connecteur Trino pour Apache Iceberg ! Au-delà des bases, nous vous invitons à découvrir les dernières nouveautés et les fonctionnalités les plus avancées. À travers des démonstrations en direct, nous explorerons des sujets clés : La gestion des branches et des tags liés aux instantanés (snapshots). Les options de maintenance pour vos tables Iceberg. Le support étendu des métastores (catalogues). Ce talk est l'occasion de maîtriser des aspects souvent méconnus pour optimiser vos tables Iceberg avec Trino.

Building an open lakehouse powered by Apache Spark and Apache Iceberg

2025-09-08 · Spark Meetup NYC - Optimizing Spark via open table formats

talk

by Brad Miro (Google)

Spark

Session on building an open lakehouse using Apache Spark and Apache Iceberg.

Zoom détaillé sur les projets Apache Iceberg et Trino

2025-07-11 · REPLAY - Construire la plateforme DATA idéale - meetup OVHcloud du 3 Avril

talk

by Victor Coustenoble (Starbust)

Trino multi-engine architecture

Zoom détaillé sur les projets Apache Iceberg et Trino avec Julien Thiaw-Kine et Victor Coustenoble. Tour d'horizon, les acteurs, les promesses et pourquoi la combinaison Iceberg et Trino a du sens. La séparation du compute et du storage avec Iceberg change la façon dont on pense les architectures data. L'approche multi-engine permet de traiter tout type de workload en utilisant le moteur adéquat. Cas d'usage et retour d'expérience de l'utilisation de Iceberg & Trino chez OVHcloud.

Oltre i paradigmi dei dati tradizionali: branching e TimeTravel su Apache Iceberg (SQL e Python)

2025-06-19 · PyDataVE #21 - #GraphAI & #DataVersioning

talk

Python SQL branching timetravel

Luca Bigon racconta la sua esperienza a Bauplan Labs, illustrando come applica pratiche di sviluppo software come branching e TimeTravel su Apache Iceberg in SQL e Python.

Making Apache Iceberg Work for Time-Series and IoT

2025-06-18 · The June Meetup

talk

by Yingjun Wu (RisingWave Labs)

Talk on optimizing Apache Iceberg for time-series and IoT.

Delivering Real-time Analytics for the Next Decade with ClickHouse® and Apache Iceberg

2025-06-18 · Real-Time Analytics and AI at Scale Meetup

talk

by Robert Hodges (Altinity)

ClickHouse

ClickHouse® is famous for real-time response, cost-efficiency, and flexible Apache 2.0 licensing. In the first half of this talk, we’ll show how ClickHouse® implements sub-second analytics with columnar storage, vectorized query, and compression on datasets running to hundreds of terabytes. We’ll then pivot and discuss how Altinity is adapting open source ClickHouse® to deliver economical, real-time response over 100 Petabytes of data or more.

One Does Not Simply Query a Stream

2025-04-14 · IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup

talk

by Viktor Gamov (Confluent)

ClickHouse Kafka SQL Trino flink kafka streams postgresql tableflow

Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.\n\nIn this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

Melting Icebergs: Enabling Analytical Access to Kafka Data through Iceberg Projections

2025-04-14 · IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup

talk

Kafka

An organisation's data has traditionally been split between the operational estate, for daily business operations, and the analytical estate for after-the-fact analysis and reporting. The journey from one side to the other is today a long and torturous one. But does it have to be?\n\nIn the modern data stack Apache Kafka is your defacto standard operational platform and Apache Iceberg has emerged as the champion of table formats to power analytical applications. Can we leverage the best of Iceberg and Kafka to create a powerful solution greater than the sum of its parts?\n\nYes you can and we did!\n\nThis isn't a typical story of connectors, ELT, and separate data stores. We've developed an advanced projection of Kafka data in an Iceberg-compatible format, allowing direct access from warehouses and analytical tools.\n\nIn this talk, we'll cover:\n* How we presented Kafka data for Iceberg processors without moving or transforming data upfront—no hidden ETL!\n* Integrating Kafka's ecosystem into Iceberg, leveraging Schema Registry, consumer groups, and more.\n* Meeting Iceberg's performance and cost reduction expectations while sourcing data directly from Kafka.\n\nExpect a technical deep dive into the protocols, formats, and services we used, all while staying true to our core principles:\n* Kafka as the single source of truth—no separate stores.\n* Analytical processors shouldn't need Kafka-specific adjustments.\n* Operational performance must remain uncompromised.\n* Kafka's mature ecosystem features, like ACLs and quotas, should be reused, not reinvented.\n\nJoin us for a thrilling account of the highs and lows of merging two data giants and stay tuned for the surprise twist at the end!

Time Travel on ACID: A gentle introduction to Apache Iceberg

2025-02-26 · Come in, it's cold outside

talk

by Michal Gancarski (GROPYUS)

acid

Apache Iceberg REST Catalog: Making Catalog Interoperability Happen

2024-09-19 · Open Source Data Deep Dive - Santa Clara, CA - 9/18/24

talk

by alex merced (Dremio)

rest catalog

Page 1 of 2

1 2