talk-data.com talk-data.com

Topic

debezium

5

tagged

Activity Trend

2 peak/qtr
2020-Q1 2026-Q1

Activities

5 activities · Newest first

Remember when debugging streaming data pipelines felt like playing detective at a crime scene, where the evidence kept shifting? Well, grab your magnifying glass because we’re about to turn you into Sherlock Holmes of the streaming world. We’ll simulate a disruptive change in an order processing pipeline that captures database changes with Debezium, processes them through Apache Flink, and tracks lineage metadata with OpenLineage and Marquez.

In this talk we will look into the details of how Kleinanzeigen, a leader in classifieds business in Germany, built a data migration system using Apache Kafka and Debezium that migrated millions of users' data from a legacy to a new platform and allowed bi-directional data sync between them in real time. We will also discover how the system allowed user's data to be updated on both platforms (partially, under certain conditions) while keeping the entire system in sync. Finally, we will learn how the system leveraged a logical clock to implement a custom synchronization algorithm that helped avoid infinite update loops between the platforms.

In this talk we will look into the details of how Kleinanzeigen, a leader in classifieds business in Germany, built a data migration system using Apache Kafka and Debezium that migrated millions of users' data from a legacy to a new platform and allowed bi-directional data sync between them in real time. We will also discover how the system allowed user's data to be updated on both platforms (partially, under certain conditions) while keeping the entire system in sync. Finally, we will learn how the system leveraged a logical clock to implement a custom synchronization algorithm that helped avoid infinite update loops between the platforms.

Abstract: You've been tasked with implementing a data streaming pipeline for propagating data changes from your operational Postgres database to a search index in OpenSearch. Data views in OS should be denormalized for fast querying, and of course there should be no noticeable impact on the production database. In this session we'll discuss how to build this data pipeline using two popular open-source projects: Debezium for log-based change data capture (CDC) and Apache Flink for stream processing. Join us for this talk and learn about: * Setting up change data streams with Debezium * Efficiently building nested data structures from 1:n joins * Deployment options: Kafka Connect vs. Flink CDC