Julien Le Dem

Activities

2

talks

Chief Architect Astronomer

Frequent Collaborators

Harel Shein Datadog 3 Tobias Macey 3 Willy Lulciuc WeWork 3 Eric Veleker 2 Ernie Ostic Manta 2 Sheeri Cabral 2

Filtering by: Airflow Summit 2025 ×

Filter by Event / Source

Data Council Austin 2024 - Day 1 3 AI Council 2025 1 Data Engineering Podcast 3 The Analytics Engineering Podcast 1 Airflow Summit 2025 2 Airflow Summit 2021 1 Airflow Summit 2023 1 Data Council 2023 1 Databricks DATA + AI Summit 2023 1

Talks & appearances

Showing 2 of 14 activities

Search activities →

Simplifying Data Lineage: How OpenLineage Empowers Airflow and Beyond

2025-07-01 · Airflow Summit 2025

session

with Julien Le Dem (Astronomer) , Harel Shein (Datadog)

Airflow Flink dbt Spark

OpenLineage has simplified collecting lineage metadata across the data ecosystem by standardizing its representation in an extensible model. It enabled a whole ecosystem improving data pipeline reliability and ease of troubleshooting in production environments. In this talk, we’ll briefly introduce the OpenLineage model and explore how this metadata is collected from Airflow, Spark, dbt, and Flink. We’ll demonstrate how to extract valuable insights and outline practical benefits and common challenges when building ingestion, processing and storage for OpenLineage data. We will also briefly show how OpenLineage events can be used to observe data pipelines exhastively and the benefits that brings.

Why Datadog Chose Airflow 3: Multi-Tenancy, Observability, and the Future of Event-Driven Workflows

2025-07-01 · Airflow Summit 2025

session

with Zach Gottesman , Julien Le Dem (Astronomer)

Airflow Datadog Luigi

Datadog is a world-class data platform ingesting more than a 100 trillion events a day, providing real-time insights. Before Airflow’s prominence, we built batch processing on Luigi, Spotify’s open-source orchestrator. As Airflow gained wide adoption, we evaluated adopting the major improvements of release 2.0, but opted for building our own orchestrator instead to realize our dataset-centric, event-driven vision. Meanwhile, the 3.0 release aligned Airflow with the same vision we pursued internally, as a modern asset-driven orchestrator. It showed how futile it is to build our own compared to the momentum of the community. We evaluated several orchestrators and decided to join forces with the Airflow project. This talk follows our journey from building a custom orchestrator to adopting and contributing to Airflow 3. We’ll share our thought process, our asset partitions use case, and how we’re working with the community to materialize the Data Awareness (AIP-73) vision. Partition-based incremental scheduling is core to our orchestration model, enabling scalable, observable pipelines thanks to Datadog’s Data Observability product providing visibility into pipeline health.