talk-data.com talk-data.com

Topic

apache iceberg

21

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

21 activities · Newest first

In this 2-hour hands-on workshop, you'll build an end-to-end streaming analytics pipeline that captures live cryptocurrency prices, processes them in real-time, and uses AI to forecast the future. Ingest live crypto data into Apache Kafka using Kafka Connect; tame that chaos with Apache Flink's stream processing; freeze streams into queryable Apache Iceberg tables using Tableflow; and forecast price trends with Flink AI.

Grafana's magic comes from seeing what's happening right now and instantly comparing it to everything that has happened before. Real-time data lets teams spot anomalies the moment they emerge. Long-term data reveals whether those anomalies are new, seasonal, or the same gremlins haunting your system every quarter. But actually building this capability? That's where everything gets messy. Today's dashboards are cobbled together from two very different worlds: Long-term data living in lakes and warehouses; Real-time streams blasting through Kafka or similar systems. These systems rarely fit together cleanly, which forces dashboard developers to wrestle with: Differing processing concepts - What does SQL even mean on a stream? Inconsistent governance - Tables vs. message schemas, different owners, different rules; Incomplete history - Not everything is kept forever, and you never know what will vanish next; Maintenance drift - As pipelines evolve, your ETL always falls behind. But what if there were no separation at all? Join us for a deep dive into a new, unified approach where real-time and historical data live together in a single, seamless dataset. Imagine dashboards powered by one source of truth that stretches from less than one second ago to five, ten, or even fifteen years into the past, without stitching, syncing, or duct-taping systems together. Using Apache Kafka, Apache Iceberg, and a modern architectural pattern that eliminates the old 'batch vs. stream' divide, we'll show how to: Build Grafana dashboards that just work with consistent semantics end-to-end; Keep every message forever without drowning in storage costs; Query real-time and historical data with the same language, same governance, same everything; Escape the ETL death spiral once and for all. If you've ever wished your dashboards were both lightning-fast and infinitely deep, this talk will show you how close that future really is.

While Kafka excels at streaming data, the real challenge lies in making that data analytically useful without sacrificing consistency or performance. This talk explores why Apache Iceberg has emerged as the ideal streaming destination, offering ACID transactions, schema evolution, and time travel capabilities that traditional data lakes can't match. Learn about some foundational tools that enable streaming pipelines and why they all converge on this next-generation table format built for flexibility and scalability.

Distributed version control systems - such as Git - unlock software development in multi-player mode: devs can safely work over the same code base, with standard (albeit perhaps not user-friendly!) abstractions for snapshotting, time-travel, and branching. Data folks have rarely been so lucky, as their projects crucially depend on data, whose life-cycle management is often cumbersome and custom. In this talk, we present open formats - such as Apache Iceberg - to practitioners with limited exposure to modern cloud infrastructure. In particular, we show how moving from datasets to tables unlocks a similar multi-player mode when building data pipelines, with equivalent abstractions for snapshotting, time-travel, branching, and a unified backbone for pipelines, data science, and AI use cases.

Apache Iceberg brings powerful metadata to the table—but how do you turn it into governance that scales without slowing teams down? In this talk, we’ll explore how Lakekeeper builds on Iceberg’s foundation to make data management, fine-grained access control, and cross-platform interoperability seamless. Learn how metadata is becoming the backbone of modern data platforms, and why that matters for anyone using Iceberg together with engines like ClickHouse.

Discussion : Apache Iceberg, Apache Kafka, Apache Polaris (incubating) sont sur un bateau: peupler des tables Iceberg. Après une introduction rapide de Apache Iceberg, Apache Kafka, Apache Polaris (incubating), nous verrons les approches possibles pour injecter de la donnée dans des tables Iceberg avec Kafka et quels sont les problématiques à considérer: gestion des snapshots, compaction des data files, etc. Nous verrons des solutions possibles et comment Apache Polaris (incubating) peut aider à la maintenance des tables.

Après une introduction rapide de Apache Iceberg, Apache Kafka, Apache Polaris (incubating), nous verrons les approches possibles pour injecter des données dans des tables Iceberg avec Kafka et quels sont les problématiques à considérer: gestion des snapshots, compaction des data files, etc. Nous verrons des solutions possibles et comment Apache Polaris (incubating) peut aider à la maintenance des tables.

Plongez au cœur du connecteur Trino pour Apache Iceberg ! Au-delà des bases, nous vous invitons à découvrir les dernières nouveautés et les fonctionnalités les plus avancées. À travers des démonstrations en direct, nous explorerons des sujets clés : La gestion des branches et des tags liés aux instantanés (snapshots). Les options de maintenance pour vos tables Iceberg. Le support étendu des métastores (catalogues). Ce talk est l'occasion de maîtriser des aspects souvent méconnus pour optimiser vos tables Iceberg avec Trino.

Zoom détaillé sur les projets Apache Iceberg et Trino avec Julien Thiaw-Kine et Victor Coustenoble. Tour d'horizon, les acteurs, les promesses et pourquoi la combinaison Iceberg et Trino a du sens. La séparation du compute et du storage avec Iceberg change la façon dont on pense les architectures data. L'approche multi-engine permet de traiter tout type de workload en utilisant le moteur adéquat. Cas d'usage et retour d'expérience de l'utilisation de Iceberg & Trino chez OVHcloud.

ClickHouse® is famous for real-time response, cost-efficiency, and flexible Apache 2.0 licensing. In the first half of this talk, we’ll show how ClickHouse® implements sub-second analytics with columnar storage, vectorized query, and compression on datasets running to hundreds of terabytes. We’ll then pivot and discuss how Altinity is adapting open source ClickHouse® to deliver economical, real-time response over 100 Petabytes of data or more.

Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.\n\nIn this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

An organisation's data has traditionally been split between the operational estate, for daily business operations, and the analytical estate for after-the-fact analysis and reporting. The journey from one side to the other is today a long and torturous one. But does it have to be?\n\nIn the modern data stack Apache Kafka is your defacto standard operational platform and Apache Iceberg has emerged as the champion of table formats to power analytical applications. Can we leverage the best of Iceberg and Kafka to create a powerful solution greater than the sum of its parts?\n\nYes you can and we did!\n\nThis isn't a typical story of connectors, ELT, and separate data stores. We've developed an advanced projection of Kafka data in an Iceberg-compatible format, allowing direct access from warehouses and analytical tools.\n\nIn this talk, we'll cover:\n* How we presented Kafka data for Iceberg processors without moving or transforming data upfront—no hidden ETL!\n* Integrating Kafka's ecosystem into Iceberg, leveraging Schema Registry, consumer groups, and more.\n* Meeting Iceberg's performance and cost reduction expectations while sourcing data directly from Kafka.\n\nExpect a technical deep dive into the protocols, formats, and services we used, all while staying true to our core principles:\n* Kafka as the single source of truth—no separate stores.\n* Analytical processors shouldn't need Kafka-specific adjustments.\n* Operational performance must remain uncompromised.\n* Kafka's mature ecosystem features, like ACLs and quotas, should be reused, not reinvented.\n\nJoin us for a thrilling account of the highs and lows of merging two data giants and stay tuned for the surprise twist at the end!