In this 2-hour hands-on workshop, you'll build an end-to-end streaming analytics pipeline that captures live cryptocurrency prices, processes them in real-time, and uses AI to forecast the future. Ingest live crypto data into Apache Kafka using Kafka Connect; tame that chaos with Apache Flink's stream processing; freeze streams into queryable Apache Iceberg tables using Tableflow; and forecast price trends with Flink AI.
talk-data.com
Company
Confluent
Speakers
18
Activities
22
Speakers from Confluent
Talks & appearances
22 activities from Confluent speakers
We were told to scale compute. But what if the real problem was never about big data, but about bad data access? In this talk, we’ll unpack two powerful, often misunderstood techniques—projection pushdown and predicate pushdown—and why they matter more than ever in a world where we want lightweight, fast queries over large datasets. These optimizations aren’t just academic—they’re the difference between querying a terabyte in seconds vs. minutes. We’ll show how systems like Flink and DuckDB leverage these techniques, what limits them (hello, Protobuf), and how smart schema and storage design—especially in formats like Iceberg and Arrow can unlock dramatic speed gains. Along the way, we’ll highlight the importance of landing data in queryable formats, and why indexing and query engines matter just as much as compute. This talk is for anyone who wants to stop fully scanning their data lakes just to read one field.
Kafka 4.1 promotes KIP-932, “Queue For Kafka,” to preview status. Beyond the specification's provocative title, it introduces "share groups" into Kafka, adopting some of the mechanisms found in queuing systems. However, there are fundamental differences with JMS or MQ. We will therefore explore in detail the KIP, its APIs and the problems it solves, but also its limitations and potential future developments.
Kafka 4.1 promotes KIP-932, “Queue For Kafka,” to preview status. Beyond the specification's provocative title, it introduces "share groups" into Kafka, adopting some of the mechanisms found in queuing systems. However, there are fundamental differences with JMS or MQ. We will therefore explore in detail the KIP, its APIs and the problems it solves, but also its limitations and potential future developments.
Moving data between operational systems and analytics platforms is often a painful process. Traditional pipelines that transfer data in and out of warehouses tend to become complex, brittle, and expensive to maintain over time.
Much of this complexity, however, is avoidable. Data in motion and data at rest—Kafka Topics and Iceberg Tables—can be treated as two sides of the same coin. By establishing an equivalence between Topics and Tables, it’s possible to transparently map between them and rethink how pipelines are built.
This talk introduces a declarative approach to bridging streaming and table-based systems. By shifting complexity into the data layer, we can decompose complex, imperative pipelines into simpler, more reliable workflows
We’ll explore the design principles behind this approach, including schema mapping and evolution between Kafka and Iceberg, and how to build a system that can continuously materialize and optimize hundreds of thousands of topics as Iceberg tables.
Whether you're building new pipelines or modernizing legacy systems, this session will provide practical patterns and strategies for creating resilient, scalable, and future-proof data architectures.
Data streaming is a really difficult problem. Despite 10+ years of attempting to simplify it, teams building real-time data pipelines can spend up to 80% of their time optimizing it or fixing downstream output by handling bad data at the lake. All we want is a service that will be reliable, handle all kinds of data, connect with all kinds of systems, be easy to manage, and scale up and down as our systems change. Oh, it should also have super low latency and result in good data. Is it too much to ask?
In this presentation, you’ll learn the basics of data streaming and architecture patterns such as DLQ, used to tackle these challenges. We will then explore how to implement these patterns using Apache Flink and discuss the challenges that real-time AI applications bring to our infra. Difficult problems are difficult, and we offer no silver bullets. Still, we will share pragmatic solutions that have helped many organizations build fast, scalable, and manageable data streaming pipelines.
Abstract: TBA
Abstract: Detecting problems as they happen is essential in today’s fast-moving, data-driven world. In this talk, you’ll learn how to build a flexible, real-time anomaly detection pipeline using Apache Kafka and Apache Flink, backed by statistical and machine learning models. We’ll start by demystifying what anomaly really means - exploring the different types (point, contextual, and collective anomalies) and the difference between unintentional issues and intentional outliers like fraud or abuse. Then, we’ll look at how anomaly detection is solved in practice: from classical statistical models like ARIMA to deep learning models like LSTM. You’ll learn how ARIMA breaks time series into AutoRegressive, Integrated, and Moving Average components, no math degree required (just a Python library). We’ll also uncover why forgetting is a feature, not a bug, when it comes to LSTMs, and how these models learn to detect complex patterns over time. Throughout, we’ll show how Kafka handles high-throughput streaming data and how Flink enables low-latency, stateful processing to catch issues as they emerge. You’ll leave knowing not just how these systems work, but when to use each type of model depending on your data and goals. Whether you're monitoring system health, tracking IoT devices, or looking for fraud in transactions, this talk will give you the foundations and tools to detect the unexpected - before it becomes a problem.
Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.\n\nIn this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.
Join this Cloud Talk to explore how Large Language Models (LLMs) can revolutionize your data workflows. Learn to automate SQL query generation and stream results into Confluent using Vertex AI for real-time analytics and decision-making. Dive into integrating advanced AI into data pipelines, simplifying SQL creation, enhancing workflows, and leveraging Vertex AI for scalable machine learning. Discover how to optimize your data infrastructure and drive insights with Confluent’s Data Streaming Platform and cutting-edge AI technology.
This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.
Dive into the world of real-time data streaming with this introduction to Apache Kafka. This talk is tailored for developers, data engineers, and IT professionals who want to gain a foundational understanding of Kafka, a powerful open-source platform used for building scalable, event-driven applications.You will learn about:
Kafka fundamentals: the core concepts of Kafka, including topics, partitions, producers, and consumers
The Kafka ecosystem: brokers, clients, Schema Registry, and Kafka Connect
Stream processing: Kafka Streams and Apache Flink
Use cases: discover how data streaming with Kafka has transformed various industries
Stream Processing has evolved quickly in a short time: only a few years ago, stream processing was mostly simple real-time aggregations with limited throughput and consistency. Today, many stream processing applications have sophisticated business logic, strict correctness guarantees, high performance, low latency, and maintain terabytes of state without databases. Stream processing frameworks also abstract a lot of the low-level details away, such as routing the data streams, taking care of concurrent executions, and handling various failure scenarios while ensuring correctness.
This talk will give an introduction into Apache Flink, one of the most advanced open source stream processors that powers applications in Netflix, Uber, and Alibaba among others. In particular, we will go through the use cases that Flink was designed for, explain concepts like stateful and event-time stream processing, and discuss Flink's APIs and ecosystem.
The data landscape is evolving rapidly, with generative AI poised to revolutionize insight generation and data culture. Join experts from Databricks, MongoDB, Confluent, and Dataiku for an exclusive executive discussion on harnessing gen AI's transformative potential. We'll explore how to break down multicloud data silos, empowering informed decision-making and unlocking your data's full value with gen AI. Discover strategies for integrating gen AI, addressing challenges, and building a future-proof, innovation-driven data culture.
Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.
Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot!
Discussion about why Kafka, why Confluent was created, evolution of Kafka and data streaming, open source and communities.
Restoring local state in Kafka Streams applications is indispensable for recovering after a failure or for moving stream processors between Kafka Streams clients. However, restoration has a reputation for being operationally problematic, because a Streams client occupied with restoration of some stream processors blocks other stream processors that are ready from processing new records. When the state is large this can have a considerable impact on the overall throughput of the Streams application. Additionally, when failures interrupt restoration, restoration restarts from the beginning, thus negatively impacting throughput further.\n\nIn this talk, we will explain how Kafka Streams currently restores local state and processes records. We will show how we decouple processing from restoring by moving restoration to a dedicated thread and how throughput profits from this decoupling. We will present how we avoid restarting restoration from the beginning after a failure. Finally, we will talk about the concurrency and performance problems that we had to overcome and we will present benchmarks that show the effects of our improvements.
Keep bad data out and refactor schemas with data quality rules.
In this session, we will explore the stream processing capabilities for Kafka and compare the three popular options: Kafka Streams, ksqlDB, and Apache Flink®. We will dive into the strengths and limitations of each technology, and compare them based on their ease of use, performance, scalability, and flexibility. By the end of the session, attendees will have a better understanding of the different options available for stream processing with Kafka, and which technology might be the best fit for their specific use case. This session is ideal for developers, data engineers, and architects who want to leverage the power of Kafka for real-time data processing.
Bio:
Before Jan Svoboda started his Apache Kafka journey at Confluent, he worked as an Advisory Platform Architect at Pivotal and DevOps Solutions Architect at IBM, among others. Jan joined Confluent in April 2020 as a Solutions Engineer, establishing microservices development as his favourite topic. Jan holds degrees in Management of Information Systems from UNYP and Computer Science from UCF.
In this session, we will explore the stream processing capabilities for Kafka and compare the three popular options: Kafka Streams, ksqlDB, and Apache Flink®. We will dive into the strengths and limitations of each technology, and compare them based on their ease of use, performance, scalability, and flexibility. By the end of the session, attendees will have a better understanding of the different options available for stream processing with Kafka, and which technology might be the best fit for their specific use case. This session is ideal for developers, data engineers, and architects who want to leverage the power of Kafka for real-time data processing.
In this session, David will demystify the misconceptions around the complexity of Apache Flink, touch on its use cases, and get you up to speed for your stream processing endeavor. All of that, in real-time.
In this session, David will demystify the misconceptions around the complexity of Apache Flink, touch on its use cases, and get you up to speed for your stream processing endeavor. All of that, in real-time.