Speaker

Oscar Ligthart

Activities

2

talks

Senior Data Engineer at Vinted

Frequent Collaborators

Rodrigo Loredo 2

Filter by Event / Source

Airflow Summit 2025 1 PyData Amsterdam 2025 1

Talks & appearances

2 activities · Newest first

Search activities →

Orchestrating success: How Vinted standardizes large-scale, decentralized data pipelines

2025-09-26 · PyData Amsterdam 2025

talk

with Rodrigo Loredo , Oscar Ligthart

Airflow Python

At Vinted, Europe’s largest second-hand marketplace, over 20 decentralized data teams generate, transform, and build products on petabytes of data. Each team utilizes their own tools, workflows, and expertise. Coordinating data pipeline creation across such diverse teams presents significant challenges. These include complex inter-team dependencies, inconsistent scheduling solutions, and rapidly evolving requirements.

This talk is aimed at data engineers, platform engineers, and technical leads with experience in workflow orchestration and will demonstrate how we empower teams at Vinted to define data pipelines quickly and reliably. We will present our user-friendly abstraction layer built on top of Apache Airflow, enhanced by a Python code generator. This abstraction simplifies upgrades and migrations, removes scheduler complexity, and supports Vinted’s rapid growth. Attendees will learn how Python abstractions and code generation can standardize pipeline development across diverse teams, reduce operational complexity, and enable greater flexibility and control in large-scale data organizations. Through practical lessons and real-world examples of our abstraction interface, we will offer insights into designing scheduler-agnostic architectures for successful data pipeline orchestration.

How Airflow solves the coordination of decentralised teams at Vinted

2025-07-01 · Airflow Summit 2025

session

with Rodrigo Loredo , Oscar Ligthart

AI/ML Airflow CI/CD dbt

Vinted is the biggest second-hand marketplace in Europe with multiple business verticals. Our data ecosystem has over 20 decentralized teams responsible for generating, transforming, and building Data Products from petabytes of data. This creates a daring environment where inter-team dependencies, varied expertise with scheduling tools, and diverse use cases need to be managed efficiently. To tackle these challenges, we have centralized our approach by leveraging Apache Airflow to orchestrate data dependencies across teams. In this session, we will present how we utilize a code generator to streamline the creation of Airflow code for numerous dbt repositories, dockerized jobs, and Vertex-AI pipelines. With this approach, we simplify the complexity and offer our users the flexibility required to accommodate their use cases. We will share our sensor-callback strategy, which we developed to manage task dependencies, overcoming the limitations of traditional dataset triggers. This approach requires a data asset registry to monitor global dependencies and SLOs, and serves as a safeguard during CI processes for detecting potential breaking changes.