Confluent

From Crypto Streams to AI-Powered Predictions

2025-12-01 · Crypto Streams to AI Predictions: Apache Kafka®, Apache Flink® & Apache Iceberg®

workshop

Olena Kutsenko (Staff Developer Advocate)

confluent cloud Kafka flink apache iceberg tableflow DuckDB

In this 2-hour hands-on workshop, you'll build an end-to-end streaming analytics pipeline that captures live cryptocurrency prices, processes them in real-time, and uses AI to forecast the future. Ingest live crypto data into Apache Kafka using Kafka Connect; tame that chaos with Apache Flink's stream processing; freeze streams into queryable Apache Iceberg tables using Tableflow; and forecast price trends with Flink AI.

Projection Pushdown vs Predicate "Pushdown": Rethinking Query Efficiency

2025-11-05 · Small Data SF 2025

talk

Adi Polak (Director)

Flink Arrow Big Data DuckDB Iceberg Protobuf

We were told to scale compute. But what if the real problem was never about big data, but about bad data access? In this talk, we’ll unpack two powerful, often misunderstood techniques—projection pushdown and predicate pushdown—and why they matter more than ever in a world where we want lightweight, fast queries over large datasets. These optimizations aren’t just academic—they’re the difference between querying a terabyte in seconds vs. minutes. We’ll show how systems like Flink and DuckDB leverage these techniques, what limits them (hello, Protobuf), and how smart schema and storage design—especially in formats like Iceberg and Arrow can unlock dramatic speed gains. Along the way, we’ll highlight the importance of landing data in queryable formats, and why indexing and query engines matter just as much as compute. This talk is for anyone who wants to stop fully scanning their data lakes just to read one field.

Discussion : KIP-932: Get in line!

2025-09-30 · IN-PERSON! Apache Kafka® Meetup Septembre

talk

Frédérik Rouleau (Sr Customer Success Technical Architect)

Kafka kip-932

Kafka 4.1 promotes KIP-932, “Queue For Kafka,” to preview status. Beyond the specification's provocative title, it introduces "share groups" into Kafka, adopting some of the mechanisms found in queuing systems. However, there are fundamental differences with JMS or MQ. We will therefore explore in detail the KIP, its APIs and the problems it solves, but also its limitations and potential future developments.

KIP-932: Get in line!

2025-09-30 · IN-PERSON! Apache Kafka® Meetup Septembre

talk

Frédérik Rouleau (Sr Customer Success Technical Architect)

kip-932 kafka 4.1 queue for kafka share groups

Kafka 4.1 promotes KIP-932, “Queue For Kafka,” to preview status. Beyond the specification's provocative title, it introduces "share groups" into Kafka, adopting some of the mechanisms found in queuing systems. However, there are fundamental differences with JMS or MQ. We will therefore explore in detail the KIP, its APIs and the problems it solves, but also its limitations and potential future developments.

No More Fragile Pipelines: Kafka and Iceberg the Declarative Way

2025-09-24 · Big Data LDN 2025

Face To Face

Adi Polak (Director Advocacy and Developer Experience Eng)

Analytics Iceberg Kafka Data Streaming

Moving data between operational systems and analytics platforms is often a painful process. Traditional pipelines that transfer data in and out of warehouses tend to become complex, brittle, and expensive to maintain over time.

Much of this complexity, however, is avoidable. Data in motion and data at rest—Kafka Topics and Iceberg Tables—can be treated as two sides of the same coin. By establishing an equivalence between Topics and Tables, it’s possible to transparently map between them and rethink how pipelines are built.

This talk introduces a declarative approach to bridging streaming and table-based systems. By shifting complexity into the data layer, we can decompose complex, imperative pipelines into simpler, more reliable workflows

We’ll explore the design principles behind this approach, including schema mapping and evolution between Kafka and Iceberg, and how to build a system that can continuously materialize and optimize hundreds of thousands of topics as Iceberg tables.

Whether you're building new pipelines or modernizing legacy systems, this session will provide practical patterns and strategies for creating resilient, scalable, and future-proof data architectures.

Stream All the Things - Patterns of Effective Data Stream Processing

2025-09-23 · IN-PERSON: Apache Kafka® x Apache Flink® Meetup

talk

Adi Polak (Director of Advocacy and Developer Experience Engineering)

flink data streaming

Data streaming is a really difficult problem. Despite 10+ years of attempting to simplify it, teams building real-time data pipelines can spend up to 80% of their time optimizing it or fixing downstream output by handling bad data at the lake. All we want is a service that will be reliable, handle all kinds of data, connect with all kinds of systems, be easy to manage, and scale up and down as our systems change. Oh, it should also have super low latency and result in good data. Is it too much to ask?

In this presentation, you’ll learn the basics of data streaming and architecture patterns such as DLQ, used to tackle these challenges. We will then explore how to implement these patterns using Apache Flink and discuss the challenges that real-time AI applications bring to our infra. Difficult problems are difficult, and we offer no silver bullets. Still, we will share pragmatic solutions that have helped many organizations build fast, scalable, and manageable data streaming pipelines.

TBA

2025-09-23 · IN PERSON: Apache Kafka x Apache Flink

talk

Adi Polak (Director of Advocacy and Developer Experience Engineering)

Abstract: TBA

Mastering real-time anomaly detection

2025-08-20 · PyData Berlin 2025 August Meetup

talk

Olena Kutsenko (Staff Developer Advocate)

AI/ML Flink IoT Kafka Python Data Streaming

Abstract: Detecting problems as they happen is essential in today’s fast-moving, data-driven world. In this talk, you’ll learn how to build a flexible, real-time anomaly detection pipeline using Apache Kafka and Apache Flink, backed by statistical and machine learning models. We’ll start by demystifying what anomaly really means - exploring the different types (point, contextual, and collective anomalies) and the difference between unintentional issues and intentional outliers like fraud or abuse. Then, we’ll look at how anomaly detection is solved in practice: from classical statistical models like ARIMA to deep learning models like LSTM. You’ll learn how ARIMA breaks time series into AutoRegressive, Integrated, and Moving Average components, no math degree required (just a Python library). We’ll also uncover why forgetting is a feature, not a bug, when it comes to LSTMs, and how these models learn to detect complex patterns over time. Throughout, we’ll show how Kafka handles high-throughput streaming data and how Flink enables low-latency, stateful processing to catch issues as they emerge. You’ll leave knowing not just how these systems work, but when to use each type of model depending on your data and goals. Whether you're monitoring system health, tracking IoT devices, or looking for fraud in transactions, this talk will give you the foundations and tools to detect the unexpected - before it becomes a problem.

One Does Not Simply Query a Stream

2025-04-14 · IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup

talk

Viktor Gamov (Principal Developer Advocate)

Kafka kafka streams flink SQL postgresql Trino

Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.\n\nIn this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

Confluent & Vertex AI: SQL Automation & Real-Time Data Streaming with LLMs

2025-04-09 · Google Cloud Next '25

session

Pascal Vantrepote (Sr. Director, Partner Innovation) , Dustin Shammo (Senior Solutions Engineer - Google Cloud Partnership)

AI/ML SQL

Join this Cloud Talk to explore how Large Language Models (LLMs) can revolutionize your data workflows. Learn to automate SQL query generation and stream results into Confluent using Vertex AI for real-time analytics and decision-making. Dive into integrating advanced AI into data pipelines, simplifying SQL creation, enhancing workflows, and leveraging Vertex AI for scalable machine learning. Discover how to optimize your data infrastructure and drive insights with Confluent’s Data Streaming Platform and cutting-edge AI technology.

This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.

Intro to Apache Kafka

2024-10-22 · IN-PERSON: Apache Kafka® Meetup Berlin - October 2024

talk

David Anderson (Software Practice Lead)

Kafka kafka streams flink schema registry kafka connect

Dive into the world of real-time data streaming with this introduction to Apache Kafka. This talk is tailored for developers, data engineers, and IT professionals who want to gain a foundational understanding of Kafka, a powerful open-source platform used for building scalable, event-driven applications.You will learn about:

Kafka fundamentals: the core concepts of Kafka, including topics, partitions, producers, and consumers

The Kafka ecosystem: brokers, clients, Schema Registry, and Kafka Connect

Stream processing: Kafka Streams and Apache Flink

Use cases: discover how data streaming with Kafka has transformed various industries

Introduction to Stateful Stream Processing with Apache Flink

2024-09-26 · VIRTUAL: Apache Flink® Meetup

talk

Robert Metzger (Staff Software Engineer II)

flink Kafka

Stream Processing has evolved quickly in a short time: only a few years ago, stream processing was mostly simple real-time aggregations with limited throughput and consistency. Today, many stream processing applications have sophisticated business logic, strict correctness guarantees, high performance, low latency, and maintain terabytes of state without databases. Stream processing frameworks also abstract a lot of the low-level details away, such as routing the data streams, taking care of concurrent executions, and handling various failure scenarios while ensuring correctness.

This talk will give an introduction into Apache Flink, one of the most advanced open source stream processors that powers applications in Netflix, Uber, and Alibaba among others. In particular, we will go through the use cases that Flink was designed for, explain concepts like stateful and event-time stream processing, and discuss Flink's APIs and ecosystem.

Taming the Cost of Kafka Workloads In the Cloud

2024-06-11 · Streaming Buzzwords: Stream Processing Meetup, Berlin Buzzwords edition

talk

Stefan Sprenger (Staff Software Engineer)

Cloud Computing Kafka

Make generative AI work: Best practices from data leaders

2024-04-09 · Google Cloud Next '24

session

Will LaForest (Field CTO)

AI/ML Databricks Dataiku GenAI

The data landscape is evolving rapidly, with generative AI poised to revolutionize insight generation and data culture. Join experts from Databricks, MongoDB, Confluent, and Dataiku for an exclusive executive discussion on harnessing gen AI's transformative potential. We'll explore how to break down multicloud data silos, empowering informed decision-making and unlocking your data's full value with gen AI. Discover strategies for integrating gen AI, addressing challenges, and building a future-proof, innovation-driven data culture.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Real-Time Analytics in the Corporate World: How Apache Pinot Powers Industry Leaders

2024-03-28 · Discover Data Delights: A Slice of Real-Time Analytics and GenAI!

talk

Viktor Gamov (Principal Developer Advocate)

apache pinot startree index

Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot!

History and evolution of Apache Kafka

2024-03-21 · Kafka meetup with Jay Kreps @ Criteo !

talk

Jay Kreps (CEO and co-founder)

Kafka confluent

Discussion about why Kafka, why Confluent was created, evolution of Kafka and data streaming, open source and communities.

Restoring Restoration's Reputation in Kafka Streams

2024-01-30 · IN PERSON! Apache Kafka® Meetup Berlin - January 2024

talk

Bruno Cadonna (Senior Software Developer II) , Lucas Brutschy (Senior Software Developer II)

Kafka kafka streams

Restoring local state in Kafka Streams applications is indispensable for recovering after a failure or for moving stream processors between Kafka Streams clients. However, restoration has a reputation for being operationally problematic, because a Streams client occupied with restoration of some stream processors blocks other stream processors that are ready from processing new records. When the state is large this can have a considerable impact on the overall throughput of the Streams application. Additionally, when failures interrupt restoration, restoration restarts from the beginning, thus negatively impacting throughput further.\n\nIn this talk, we will explain how Kafka Streams currently restores local state and processes records. We will show how we decouple processing from restoring by moving restoration to a dedicated thread and how throughput profits from this decoupling. We will present how we avoid restarting restoration from the beginning after a failure. Finally, we will talk about the concurrency and performance problems that we had to overcome and we will present benchmarks that show the effects of our improvements.

Data Quality Rules (in english)

2024-01-16 · IN PERSON! Apache Kafka® Meetup Paris @ Devoteam - January 2024

talk

Gilles Philippart (Software Practice Lead)

Data Quality Kafka

Keep bad data out and refactor schemas with data quality rules.

Stream Processing for Kafka: Comparing Kafka Streams, ksqlDB and Flink

2023-08-29 · Apache Flink® for Apache Kafka® Developers

talk

Jan Svoboda (Senior Solutions Engineer)

kafka streams ksqldb flink Kafka

In this session, we will explore the stream processing capabilities for Kafka and compare the three popular options: Kafka Streams, ksqlDB, and Apache Flink®. We will dive into the strengths and limitations of each technology, and compare them based on their ease of use, performance, scalability, and flexibility. By the end of the session, attendees will have a better understanding of the different options available for stream processing with Kafka, and which technology might be the best fit for their specific use case. This session is ideal for developers, data engineers, and architects who want to leverage the power of Kafka for real-time data processing.

Bio:

Before Jan Svoboda started his Apache Kafka journey at Confluent, he worked as an Advisory Platform Architect at Pivotal and DevOps Solutions Architect at IBM, among others. Jan joined Confluent in April 2020 as a Solutions Engineer, establishing microservices development as his favourite topic. Jan holds degrees in Management of Information Systems from UNYP and Computer Science from UCF.

Stream Processing for Kafka: Comparing Kafka Streams, ksqlDB and Flink

2023-08-29 · Apache Flink® for Apache Kafka® Developers

talk

Jan Svoboda (Senior Solutions Engineer)

kafka streams ksqldb flink

In this session, we will explore the stream processing capabilities for Kafka and compare the three popular options: Kafka Streams, ksqlDB, and Apache Flink®. We will dive into the strengths and limitations of each technology, and compare them based on their ease of use, performance, scalability, and flexibility. By the end of the session, attendees will have a better understanding of the different options available for stream processing with Kafka, and which technology might be the best fit for their specific use case. This session is ideal for developers, data engineers, and architects who want to leverage the power of Kafka for real-time data processing.

Introduction to Apache Flink®

2023-08-29 · Apache Flink® for Apache Kafka® Developers

talk

David Moravek (Staff Software Engineer II)

flink

In this session, David will demystify the misconceptions around the complexity of Apache Flink, touch on its use cases, and get you up to speed for your stream processing endeavor. All of that, in real-time.

Introduction to Apache Flink®

2023-08-29 · Apache Flink® for Apache Kafka® Developers

talk

David Moravek (Staff Software Engineer II)

flink

In this session, David will demystify the misconceptions around the complexity of Apache Flink, touch on its use cases, and get you up to speed for your stream processing endeavor. All of that, in real-time.

Speakers from Confluent