talk-data.com talk-data.com

Topic

flink

3

tagged

Activity Trend

2 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Tides of Change: Real-Time Flow with Postgres, Kafka & Flink ×

While upgrading Flink to its latest versions to enable more AI-related capabilities, one can easily run into tricky savepoint incompatibilities that render existing state snapshots unusable for recovery. This is especially problematic in the case of pipelines with large state. In such cases, doing a backfill can take too long and using the State Processor API leads to downtime or breaking the exactly-once delivery guarantee.

In this talk, I’ll share a state migration pattern that I applied to one of our Flink jobs using regular streaming mode. It involves creating a new stateful operator that conforms to the new requirements, allowing for compatible savepoint creation. Leveraging side outputs and custom key traversal the existing state is forwarded to the new operator. In the meantime, regular processing is uninterrupted.

We’ll explore the core problem and understand the pitfalls and trade-offs of existing solutions such as the State Processor API. Then, a deep-dive into the migration pattern will follow: ensuring correct state handoff between operator versions, setting up triggers to migrate all keys and other technicalities. Lastly, a few words about cleaning up seamlessly. With this session I will add a nice pattern to your toolbox that you can easily apply next time you run into state migration challenges.

At Fresha, we became the pioneers that put StarRocks to test in production for realtime analytical workloads. But one of the first challenges we faced was getting all the data there reliably and efficiently. We had to think about historical data, and realtime data and orchestrate all of that, such that we can move fast, without breaking too many things. Our tools of choice: Airflow, StarRocks Pipes, Apache Flink. In this talk, I’ll share how we built our data pipelines using Apache Flink and Airflow, what worked and what didn’t for us. Along the way, we’ll explore how Flink helps ensure data consistency, handles failures gracefully, and keeps our real-time workloads running strong.

Kafka and Flink tend to get lumped in as "data services", in the sense that they process data, but in comparison to traditional databases they differ quite dramatically in functionality and utility. In this talk, we'll run through the lifetime of a write in Postgres to establish a baseline, understanding all the different services that data hits on its way down to the disk. Then we'll walk through writing data to a Kafka topic, and what 'writing' (or really, streaming) data to a Flink workflow looks like from a similar systems perspective. Along the way, we'll understand the key differences between the services and why some are more suited to long-term data storage than others.