In this 2-hour hands-on workshop, you'll build an end-to-end streaming analytics pipeline that captures live cryptocurrency prices, processes them in real-time, and uses AI to forecast the future. Ingest live crypto data into Apache Kafka using Kafka Connect; tame that chaos with Apache Flink's stream processing; freeze streams into queryable Apache Iceberg tables using Tableflow; and forecast price trends with Flink AI.
talk-data.com
Topic
flink
29
tagged
Activity Trend
Top Events
Remember when debugging streaming data pipelines felt like playing detective at a crime scene, where the evidence kept shifting? Well, grab your magnifying glass because we’re about to turn you into Sherlock Holmes of the streaming world. We’ll simulate a disruptive change in an order processing pipeline that captures database changes with Debezium, processes them through Apache Flink, and tracks lineage metadata with OpenLineage and Marquez.
While upgrading Flink to its latest versions to enable more AI-related capabilities, one can easily run into tricky savepoint incompatibilities that render existing state snapshots unusable for recovery. This is especially problematic in the case of pipelines with large state. In such cases, doing a backfill can take too long and using the State Processor API leads to downtime or breaking the exactly-once delivery guarantee.
In this talk, I’ll share a state migration pattern that I applied to one of our Flink jobs using regular streaming mode. It involves creating a new stateful operator that conforms to the new requirements, allowing for compatible savepoint creation. Leveraging side outputs and custom key traversal the existing state is forwarded to the new operator. In the meantime, regular processing is uninterrupted.
We’ll explore the core problem and understand the pitfalls and trade-offs of existing solutions such as the State Processor API. Then, a deep-dive into the migration pattern will follow: ensuring correct state handoff between operator versions, setting up triggers to migrate all keys and other technicalities. Lastly, a few words about cleaning up seamlessly. With this session I will add a nice pattern to your toolbox that you can easily apply next time you run into state migration challenges.
At Fresha, we became the pioneers that put StarRocks to test in production for realtime analytical workloads. But one of the first challenges we faced was getting all the data there reliably and efficiently. We had to think about historical data, and realtime data and orchestrate all of that, such that we can move fast, without breaking too many things. Our tools of choice: Airflow, StarRocks Pipes, Apache Flink. In this talk, I’ll share how we built our data pipelines using Apache Flink and Airflow, what worked and what didn’t for us. Along the way, we’ll explore how Flink helps ensure data consistency, handles failures gracefully, and keeps our real-time workloads running strong.
Kafka and Flink tend to get lumped in as "data services", in the sense that they process data, but in comparison to traditional databases they differ quite dramatically in functionality and utility. In this talk, we'll run through the lifetime of a write in Postgres to establish a baseline, understanding all the different services that data hits on its way down to the disk. Then we'll walk through writing data to a Kafka topic, and what 'writing' (or really, streaming) data to a Flink workflow looks like from a similar systems perspective. Along the way, we'll understand the key differences between the services and why some are more suited to long-term data storage than others.
The next generation of streaming isn't about faster pipelines, but about smarter connections. DeltaJoin, a new operator in Apache Flink, reimagines stream joins by moving from brute-force state to change-driven computation. Paired with Fluss, Flink's purpose-built storage layer, it enables systems that are real-time, scalable, and cost-efficient. Anton will show how DeltaJoin and Fluss shift streaming architecture from ephemeral flows to durable, queryable state that bridges real-time processing with lakehouse patterns. Drawing on production experience, he'll demonstrate how these innovations reduce join costs, simplify architectures, and unlock new possibilities for real-time analytics. Attendees will leave with a vision of Flink 2.x as the backbone for event-driven systems and modern data platforms.
The next generation of streaming isn't about faster pipelines, but about smarter connections. DeltaJoin, a new operator in Apache Flink, reimagines stream joins by moving from brute-force state to change-driven computation. Paired with Fluss, Flink's purpose-built storage layer, it enables systems that are real-time, scalable, and cost-efficient. Anton will show how DeltaJoin and Fluss shift streaming architecture from ephemeral flows to durable, queryable state that bridges real-time processing with lakehouse patterns. Drawing on production experience, he'll demonstrate how these innovations reduce join costs, simplify architectures, and unlock new possibilities for real-time analytics. Attendees will leave with a vision of Flink 2.x as the backbone for event-driven systems and modern data platforms.
By leveraging tools like Jaeger and New Relic, we will uncover how to gain a full view of your microservices, even in the face of Apache Kafka's asynchronous nature. Join us for a live demo with a simple Java Spring-Boot app, where we will walk through both automatic and manual instrumentation to capture rich telemetry. We will also touch on infrastructure-level observability, pulling metrics and traces from Apache Kafka brokers and Apache Flink.
Data streaming is a really difficult problem. Despite 10+ years of attempting to simplify it, teams building real-time data pipelines can spend up to 80% of their time optimizing it or fixing downstream output by handling bad data at the lake. All we want is a service that will be reliable, handle all kinds of data, connect with all kinds of systems, be easy to manage, and scale up and down as our systems change. Oh, it should also have super low latency and result in good data. Is it too much to ask?
In this presentation, you’ll learn the basics of data streaming and architecture patterns such as DLQ, used to tackle these challenges. We will then explore how to implement these patterns using Apache Flink and discuss the challenges that real-time AI applications bring to our infra. Difficult problems are difficult, and we offer no silver bullets. Still, we will share pragmatic solutions that have helped many organizations build fast, scalable, and manageable data streaming pipelines.
In this session, we’ll walk through how Apache Flink was used to enable near real-time operational insights using manufacturing IIoT Data sets. The goal: deliver actionable KPIs to production teams with sub-30-second latency, using streaming data pipelines built Kafka, Flink and Grafana. We’ll cover the key architectural patterns that made this possible, including handling structured data joins, managing out-of-order events, and integrating with downstream systems like PostgreSQL and Grafana. We’ll also share real-world performance benchmarks, lessons learned from scaling tests, and practical considerations for deploying Flink in a production-grade, low-latency analytics pipeline. The session will also include a live demo
If you're building Flink-based solutions for time-sensitive operations—whether in manufacturing, IoT, or other domains—this talk will provide proven insights from the field.
In this session, we’ll walk through how Apache Flink was used to enable near real-time operational insights using manufacturing IIoT Data sets. The goal: deliver actionable KPIs to production teams with sub-30-second latency, using streaming data pipelines built Kafka, Flink and Grafana. We’ll cover the key architectural patterns that made this possible, including handling structured data joins, managing out-of-order events, and integrating with downstream systems like PostgreSQL and Grafana. We’ll also share real-world performance benchmarks, lessons learned from scaling tests, and practical considerations for deploying Flink in a production-grade, low-latency analytics pipeline. The session will also include a live demo
If you're building Flink-based solutions for time-sensitive operations—whether in manufacturing, IoT, or other domains—this talk will provide proven insights from the field.
DISCLAIMER We don't cater to attendees under the age of 18. If you want to host or speak at a meetup, please email [email protected]
According to Wikipedia, Infrastructure as Code is the process of managing and provisioning computer data center resources through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. This also applies to resources and reference data, connector plugins, connector configurations, and stream processes to clean up the data.
In this talk, we are going to discuss the use cases based on the Network Rail Data Feeds, the scripts used to spin up the environment and cluster in the Confluent Cloud as well as the different components required for the ingress and processing of the data.
This particular environment is used as a teaching tool for Event Stream Processing for Kafka Streams, ksqlDB, and Flink. Some examples of further processing and visualisation will also be provided.
According to Wikipedia, Infrastructure as Code is the process of managing and provisioning computer data center resources through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. This also applies to resources and reference data, connector plugins, connector configurations, and stream processes to clean up the data.
In this talk, we are going to discuss the use cases based on the Network Rail Data Feeds, the scripts used to spin up the environment and cluster in the Confluent Cloud as well as the different components required for the ingress and processing of the data.
This particular environment is used as a teaching tool for Event Stream Processing for Kafka Streams, ksqlDB, and Flink. Some examples of further processing and visualisation will also be provided.
Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.\n\nIn this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.
Dive into the world of real-time data streaming with this introduction to Apache Kafka. This talk is tailored for developers, data engineers, and IT professionals who want to gain a foundational understanding of Kafka, a powerful open-source platform used for building scalable, event-driven applications.You will learn about:
Kafka fundamentals: the core concepts of Kafka, including topics, partitions, producers, and consumers
The Kafka ecosystem: brokers, clients, Schema Registry, and Kafka Connect
Stream processing: Kafka Streams and Apache Flink
Use cases: discover how data streaming with Kafka has transformed various industries
This talk walks through the process of creating real-time data pipelines using Flink. It introduces how to connect Flink with various data sources (like Kafka, or relational databases), focusing on transforming and enriching data streams. This talk is useful for understanding how Flink integrates with other components in a typical data processing pipeline.
Stream Processing has evolved quickly in a short time: only a few years ago, stream processing was mostly simple real-time aggregations with limited throughput and consistency. Today, many stream processing applications have sophisticated business logic, strict correctness guarantees, high performance, low latency, and maintain terabytes of state without databases. Stream processing frameworks also abstract a lot of the low-level details away, such as routing the data streams, taking care of concurrent executions, and handling various failure scenarios while ensuring correctness.
This talk will give an introduction into Apache Flink, one of the most advanced open source stream processors that powers applications in Netflix, Uber, and Alibaba among others. In particular, we will go through the use cases that Flink was designed for, explain concepts like stateful and event-time stream processing, and discuss Flink's APIs and ecosystem.