talk-data.com talk-data.com

Topic

kafka streams

9

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

9 activities · Newest first

Struggled with the complexity of designing Kafka Streams applications? Without sufficient up-front architecture work, it’s all too easy to stumble into misunderstandings, rework, or outright failure. Although standards like UML and C4 model have guided software designs for years, stream processing has lacked its own visual framework—until now.

KSTD (Kafka Streams Topology Design) introduces an open standard and component library for describing and visualising Kafka Stream Topologies with Excalidraw. Simple design principles ensure teams can keep diagrams simple yet include important details, build trust in their designs, and streamline the development lifecycle.

You will learn how standardised diagrams support team alignment, and how KSTD fosters consistent and clear communication for Kafka Streams.

Design up-front, avoid mistakes, save time, and build trust.

Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.\n\nIn this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

In a world increasingly reliant on real-time data processing, we require efficient access to live, distributed states. This talk delves into Interactive Queries v2 in Kafka Streams — a powerful feature that allows direct, real-time access to the state of distributed Kafka Streams applications.

We will build on the context of stream processing and why accessing the state of distributed applications is crucial for modern data-driven architectures. Through this lens, you will see how Interactive Queries bridge the gap between real-time analytics state and external consumption, enabling you to unlock new possibilities for monitoring, data exploration, and responsive application behavior.

Building on this foundation, we will discuss a running example illustrating how Interactive Queries can be leveraged to create a dynamic and responsive application. We will guide you through the different types of state stores and how to query them.

Attendees will leave with a solid understanding of how Interactive Queries work, the types of problems they solve, and how they can be applied effectively to enhance the value of Kafka-based streaming applications.

Dive into the world of real-time data streaming with this introduction to Apache Kafka. This talk is tailored for developers, data engineers, and IT professionals who want to gain a foundational understanding of Kafka, a powerful open-source platform used for building scalable, event-driven applications.You will learn about:

Kafka fundamentals: the core concepts of Kafka, including topics, partitions, producers, and consumers

The Kafka ecosystem: brokers, clients, Schema Registry, and Kafka Connect

Stream processing: Kafka Streams and Apache Flink

Use cases: discover how data streaming with Kafka has transformed various industries

Restoring local state in Kafka Streams applications is indispensable for recovering after a failure or for moving stream processors between Kafka Streams clients. However, restoration has a reputation for being operationally problematic, because a Streams client occupied with restoration of some stream processors blocks other stream processors that are ready from processing new records. When the state is large this can have a considerable impact on the overall throughput of the Streams application. Additionally, when failures interrupt restoration, restoration restarts from the beginning, thus negatively impacting throughput further.\n\nIn this talk, we will explain how Kafka Streams currently restores local state and processes records. We will show how we decouple processing from restoring by moving restoration to a dedicated thread and how throughput profits from this decoupling. We will present how we avoid restarting restoration from the beginning after a failure. Finally, we will talk about the concurrency and performance problems that we had to overcome and we will present benchmarks that show the effects of our improvements.

Kestra est un orchestrateur de données basé sur Kafka Stream sur la version Entreprise. Kafka apporte haute disponibilité et performance sans compromis mais le développeur d'une application Kafka Streams doit comprendre son fonctionnement interne pour éviter des erreurs douloureuses. Ludovic va nous partager ses astuces qui lui ont permis de rendre Kestra hautement distribué et fiable sans avoir un doctorat Jepsen!

In this session, we will explore the stream processing capabilities for Kafka and compare the three popular options: Kafka Streams, ksqlDB, and Apache Flink®. We will dive into the strengths and limitations of each technology, and compare them based on their ease of use, performance, scalability, and flexibility. By the end of the session, attendees will have a better understanding of the different options available for stream processing with Kafka, and which technology might be the best fit for their specific use case. This session is ideal for developers, data engineers, and architects who want to leverage the power of Kafka for real-time data processing.

In this session, we will explore the stream processing capabilities for Kafka and compare the three popular options: Kafka Streams, ksqlDB, and Apache Flink®. We will dive into the strengths and limitations of each technology, and compare them based on their ease of use, performance, scalability, and flexibility. By the end of the session, attendees will have a better understanding of the different options available for stream processing with Kafka, and which technology might be the best fit for their specific use case. This session is ideal for developers, data engineers, and architects who want to leverage the power of Kafka for real-time data processing.

Bio:

Before Jan Svoboda started his Apache Kafka journey at Confluent, he worked as an Advisory Platform Architect at Pivotal and DevOps Solutions Architect at IBM, among others. Jan joined Confluent in April 2020 as a Solutions Engineer, establishing microservices development as his favourite topic. Jan holds degrees in Management of Information Systems from UNYP and Computer Science from UCF.

Salomon will showcase our open-source approach helping data engineers to define complex Kafka pipelines consisting of multiple interconnected Kafka Streams applications, Kafka Producers/Consumers, and Kafka Connectors in a straightforward manner. Our ready-to-use approach provides automated lifecycle management of Kafka pipelines on Kubernetes, i.e., handling deployment, update, destroy, reset, and clean-up transparently. In addition, we demonstrate how this approach provides a comprehensible view of all Kafka pipelines and their metrics.