talk-data.com talk-data.com

Filter by Source

Select conferences and events

Showing 6 results

Activities & events

Title & Speakers Event
Csanád Bakos – Data Engineer @ Vinted

While upgrading Flink to its latest versions to enable more AI-related capabilities, one can easily run into tricky savepoint incompatibilities that render existing state snapshots unusable for recovery. This is especially problematic in the case of pipelines with large state. In such cases, doing a backfill can take too long and using the State Processor API leads to downtime or breaking the exactly-once delivery guarantee.

In this talk, I’ll share a state migration pattern that I applied to one of our Flink jobs using regular streaming mode. It involves creating a new stateful operator that conforms to the new requirements, allowing for compatible savepoint creation. Leveraging side outputs and custom key traversal the existing state is forwarded to the new operator. In the meantime, regular processing is uninterrupted.

We’ll explore the core problem and understand the pitfalls and trade-offs of existing solutions such as the State Processor API. Then, a deep-dive into the migration pattern will follow: ensuring correct state handoff between operator versions, setting up triggers to migrate all keys and other technicalities. Lastly, a few words about cleaning up seamlessly. With this session I will add a nice pattern to your toolbox that you can easily apply next time you run into state migration challenges.

flink state processor api
Tides of Change: Real-Time Flow with Postgres, Kafka & Flink
Csanád Bakos – Data Engineer @ Vinted

Talk by Csanád Bakos, Data Engineer at Vinted.

Nicoleta Lazar – Sr. Data Engineer @ Fresha

Talk by Nicoleta Lazar, Senior Data Engineer at Fresha.

Nicoleta Lazar – Sr. Data Engineer @ Fresha

At Fresha, we became the pioneers that put StarRocks to test in production for realtime analytical workloads. But one of the first challenges we faced was getting all the data there reliably and efficiently. We had to think about historical data, and realtime data and orchestrate all of that, such that we can move fast, without breaking too many things. Our tools of choice: Airflow, StarRocks Pipes, Apache Flink. In this talk, I’ll share how we built our data pipelines using Apache Flink and Airflow, what worked and what didn’t for us. Along the way, we’ll explore how Flink helps ensure data consistency, handles failures gracefully, and keeps our real-time workloads running strong.

Airflow starrocks pipes flink
Celeste Hogan – Developer Advocate @ Snowflake

Kafka and Flink tend to get lumped in as "data services", in the sense that they process data, but in comparison to traditional databases they differ quite dramatically in functionality and utility. In this talk, we'll run through the lifetime of a write in Postgres to establish a baseline, understanding all the different services that data hits on its way down to the disk. Then we'll walk through writing data to a Kafka topic, and what 'writing' (or really, streaming) data to a Flink workflow looks like from a similar systems perspective. Along the way, we'll understand the key differences between the services and why some are more suited to long-term data storage than others.

postgresql Kafka flink
Celeste Hogan – Developer Advocate @ Snowflake

Talk by Celeste Hogan, Developer Advocate at Snowflake.

Snowflake
Tides of Change: Real-Time Flow with Postgres, Kafka & Flink
Showing 6 results