talk-data.com

Topic

Astronomer

airflow data_orchestration cloud

Activities

tagged

Activity Trend

9 peak/qtr

2020-Q1 2026-Q1

Top Events

Airflow Summit 2025 9 Airflow Summit 2024 8 Airflow Summit 2023 6 Data Engineering Podcast 3 Airflow Summit 2022 2 Airflow Summit 2021 2 Data + AI Summit 2025 2 PyData London 2025 1 Airflow Summit 2020 1 Data Council 2023 1 Data Skeptic 1

Top Speakers

Ry Walker (Astronomer) 4 Tatiana Al-Chueyr Martins (Astronomer) 4 Pankaj Koti 3 Tobias Macey 3 Viraj Parekh 2 Vikram Koka (Astronomer) 2 Ryan Hatter 2 Rahul Vats 2 Pankaj Singh 2 Kenten Danas 2 Maggie Stark 2 Kyle Polich 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Tatiana Al-Chueyr Martins ×

Benchmarking the Performance of Dynamically Generated DAGs

2025-07-01 · Airflow Summit 2025

session

by Rahul Vats , Tatiana Al-Chueyr Martins (Astronomer)

Airflow Cosmos GitHub Kubernetes

As teams scale their Airflow workflows, a common question is: “My DAG has 5,000 tasks—how long will it take to run in Airflow?” Beyond execution time, users often face challenges with dynamically generated DAGs, such as: Delayed visualization in the Airflow UI after deployment. High resource consumption, leading to Kubernetes pod evictions and out-of-memory errors. While estimating the resource utilization in a distributed data platform is complex, benchmarking can provide crucial insights. In this talk, we’ll share our approach to benchmarking dynamically generated DAGs with Astronomer Cosmos ( https://github.com/astronomer/astronomer-cosmos) , covering: Designing representative and extensible baseline tests. Setting up an isolated, distributed infrastructure for benchmarking. Running reproducible performance tests. Measuring DAG run times and task throughput. Evaluating CPU & memory consumption to optimize deployments. By the end of this session, you will have practical benchmarks and strategies for making informed decisions about evaluating the performance of DAGs in Airflow.

Boosting dbt-core workflows performance with Airflow’s Deferrable capabilities

2025-07-01 · Airflow Summit 2025

session

by Pankaj Singh , Pankaj Koti , Tatiana Al-Chueyr Martins (Astronomer)

Airflow Cloud Computing Cosmos dbt GitHub

Efficiently handling long-running workflows is crucial for scaling modern data pipelines. Apache Airflow’s deferrable operators help offload tasks during idle periods — freeing worker slots while tracking progress. This session explores how Cosmos 1.9 ( https://github.com/astronomer/astronomer-cosmos ) integrates Airflow’s deferrable capabilities to enhance orchestrating dbt ( https://github.com/dbt-labs/dbt-core ) in production, with insights from recent contributions that introduced this functionality. Key takeaways: Deferrable Operators: How they work and why they’re ideal for long-running dbt tasks. Integrating with Cosmos: Refactoring and enhancements to enable deferrable behaviour across platforms. Performance Gains: Resource savings and task throughput improvements from deferrable execution. Challenges & Future Enhancements: Lessons learned, compatibility, and ideas for broader support. Whether orchestrating dbt models on a cloud warehouse or managing large-scale transformations, this session offers practical strategies to reduce resource contention and boost pipeline performance.

Productionising dbt-core with Airflow

2025-07-01 · Airflow Summit 2025

session

by Pankaj Singh , Pankaj Koti , Tatiana Al-Chueyr Martins (Astronomer)

Airflow Analytics Analytics Engineering Cosmos dbt

As a popular open-source library for analytics engineering, dbt is often combined with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models. This workshop will cover a step-by-step guide to Cosmos , a popular open-source package from Astronomer that helps you quickly run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through: Running and visualising your dbt transformations Managing dependency conflicts Defining database credentials (profiles) Configuring source and test nodes Using dbt selectors Customising arguments per model Addressing performance challenges Leveraging deferrable operators Visualising dbt docs in the Airflow UI Example of how to deploy to production Troubleshooting We encourage participants to bring their dbt project to follow this step-by-step workshop.

Integrating dbt with Airflow: Overcoming performance hurdles

2024-07-01 · Airflow Summit 2024

session

by Pankaj Koti , Tatiana Al-Chueyr Martins (Astronomer)

Airflow Cosmos dbt GitHub

The integration between dbt and Airflow is a popular topic in the community, both in previous editions of Airflow Summit, in Coalesce and the #airflow-dbt Slack channel. Astronomer Cosmos ( https://github.com/astronomer/astronomer-cosmos/ ) stands out as one of the libraries that strives to enhance this integration, having over 300k downloads per month. During its development, we’ve encountered various performance challenges in terms of scheduling and task execution. While we’ve managed to address some, others remain to be resolved. This talk describes how Cosmos works, the improvements made over the last 1.5 years, and the roadmap. It also aims to collect feedback from the community on how we can further improve the experience of running dbt in Airflow.