Astronomer

What's New in Airflow 3.1

2025-12-17 · NYC Airflow Meetup at Astronomer HQ!

talk

Brent Bovenzi (Staff Frontend Engineer at Astronomer) , Ash Berlin-Taylor (Airflow PMC member & Director Airflow Engineering at Astronomer)

Airflow

Airflow 3.1 is here, bringing new features and enhancements that make orchestration faster, more flexible, and easier to manage at scale. In this session, we’ll walk through the most impactful changes in 3.1, including improvements to the user experience, DAG management, and core functionality. Whether you’re running Airflow in production or just getting started, you’ll leave with a clear picture of what’s new, why it matters, and how to take advantage of it.

What's New in Airflow 3.1

2025-12-10 · Philadelphia Apache Airflow® Chapter Launch at Paladar

talk

Tony Costanza (Sales Engineer)

Airflow

Airflow 3.1 is here, bringing new features and enhancements that make orchestration faster, more flexible, and easier to manage at scale. In this session, we’ll walk through the most impactful changes in 3.1, including improvements to the user experience, DAG management, and core functionality. Whether you’re running Airflow in production or just getting started, you’ll leave with a clear picture of what’s new, why it matters, and how to take advantage of it.

From Chaos to Cosmos: Simplifying DAG Creation in Apache Airflow

2025-09-17 · Berlin Apache Airflow Meetup at GetYourGuide!

talk

Tatiana Al-Chueyr Martins (Principal Software Engineer)

Airflow dbt cosmos astronomer cosmos

dbt has become the de facto standard for transforming data in modern analytics stacks. But as projects grow, so does the question: where should dbt run in production, and how can we make it faster? In this talk, we’ll compare the performance trade-offs between running dbt natively and orchestrating it through Airflow using Cosmos, with a focus on workflow efficiency at scale. Using a 200-model dbt project as a case study, we’ll show how workflow execution time in Cosmos was reduced from 15 minutes to just 5 minutes. We’ll also discuss opportunities to push performance further—ranging from better DAG optimization to warehouse-aware scheduling strategies. Whether you’re a data engineer, analytics engineer, or platform owner, you’ll leave with practical strategies to optimize dbt execution and inspiration for what’s next in large-scale orchestration

Lessons from Airflow gone wrong: How to set yourself up to scale successfully

2025-07-01 · Airflow Summit 2025

session

Annie Friedman (Resident Solutions Architect)

Airflow Data Quality Pandas

Ever seen a DAG go rogue and deploy itself? Or try to time travel back to 1999? Join us for a light-hearted yet painfully relatable look at how not to scale your Airflow deployment to avoid chaos and debugging nightmares. We’ll cover the classics: hardcoded secrets, unbounded retries (hello, immortal task!), and the infamous spaghetti DAG where 200 tasks are lovingly connected by hand and no one dares open the Airflow UI anymore. If you’ve ever used datetime.now() in your DAG definition and watched your backfills implode, this talk is for you. From the BashOperator that became sentient to the XCom that tried to pass a whole Pandas DataFrame and the key to your mother’s house, we’ll walk through real-world bloopers with practical takeaways. You’ll learn why overusing PythonOperator is a recipe for mess, how not to use sensors unless you enjoy resource starvation, and why scheduling in local timezones is basically asking for a daylight savings time horror story. Other highlights include: Over-provisioning resources in KubernetesPodOperator: many teams allocate excessive memory/CPU “just in case”, leading to cluster contention and resource waste. Dynamic task mapping gone wild: 10,000 mapped tasks later… the scheduler is still crying. SLAs used as data quality guarantees: creating alerts so noisy, nobody listens. Design-free DAGs: no docs, no comments, no idea why a task has a 3-day timeout. Finally, we’ll round it out with some dos and don’ts: using environment variables, avoiding memory-hungry monolith DAGs, skipping global imports, and not allocating 10x more memory “just in case.” Whether you’re new to Airflow or battle-hardened from a thousand failed backfills, come learn how to scale your pipelines without losing your mind (or your cluster).

PR Presentation

2025-02-07 · Airflow Monthly Virtual Town Hall- February

talk

Ash Berlin-Taylor (Airflow PMC member & Director Airflow Engineering at Astronomer)

Converting Legacy Schedulers to Airflow

2024-07-01 · Airflow Summit 2024

session

Fritz Davenport (Senior Data Engineer & Team Lead, Customer Dept)

Airflow SSIS

Having helped many customers to migrate thousands of workloads, we will discuss the process of migrations, and how we built an open-source framework to migrate legacy scheduler workflows via standard sets of patterns to Airflow Projects. This framework is easily extended to encompass schedulers such as Automic, Autosys, Oozie, JAMS, SSIS and others, and has turned a difficult process requiring months or years to a simple one taking days or weeks.

How the Airflow Community Productionizes Generative AI

2024-07-01 · Airflow Summit 2024

session

Pete DeJoy (co-founder and product lead)

AI/ML Airflow Data Engineering Data Quality GenAI LLM

Every data team out there is being asked from their business stakeholders about Generative AI. Taking LLM centric workloads to production is not a trivial task. At the foundational level, there are a set of challenges around data delivery, data quality, and data ingestion that mirror traditional data engineering problems. Once you’re past those, there’s a set of challenges related to the underlying use case you’re trying to solve. Thankfully, because of how Airflow was already being used at these companies for data engineering and MLOps use cases, it has become the defacto orchestration layer behind many GenAI use cases for startups and Fortune 500s. This talk will be a tour of various methods, best practices, and considerations used in the Airflow community when taking GenAI use cases to production. We’ll focus on 4 primary use cases; RAG, fine tuning, resource management, and batch inference and take a walk through patterns different members in the community have used to productionize this new, exciting technology.

The Frugal Dev’s Guide to LLMs

2024-05-15 · NYC Airflow Rooftop Happy Hour ft. PMC Member Jarek Potiuk!

talk

David Xue (Machine Learning Engineer) , Julian LaNeve (Chief Technology Officer at Astronomer)

llms fine-tuning NLP Airflow

Julian and David will cover the Hackathon project they worked on that won at the New York Stock Exchange— fine tuning an LLM to generate summaries for airflow task failures.

Airflow at Faire: Democratizing ML feature store framework at scale

2023-07-01 · Airflow Summit 2023

session

Victoria Varney (Sr Customer Success)

AI/ML Airflow Data Science

Data science and machine learning are at the heart of Faire’s industry-celebrated marketplace (a16z top-ranked marketplace) and drive powerful search, navigation, and risk functions which are powered by ML models that are trained on 3000+ features defined by our data scientists. Previously, defining, backfilling and maintaining feature lifecycle was error-prone. Having a framework built on top of Airflow has empowered them to maintain and deploy their changes independently. We will explore: How to leverage Airflow as a tool that can power ML training and extend it with a framework that powers feature store. Enabling data scientists to define new features and backfill them (common problem in the ML world) using dynamic DAGs. The talk will provide valuable insights into how Faire constructed a framework that builds datasets to train models. Plus, how empowering end-users with tools isn’t something to fear but frees up engineering teams to focus on strategic initiatives.

Things to Consider When Building an Airflow Service

2023-07-01 · Airflow Summit 2023

session

Pete DeJoy (co-founder and product lead)

Airflow Cyber Security

Data platform teams often find themselves in a situation where they have to provide Airflow as a service to downstream teams, as more users and use cases in their organization require an orchestrator. In these situations, it’s giving each team it’s own Airflow environment can unlock velocity and actually be lower overhead to maintain than a monolithic environment. This talk will be about things to keep in mind when building an Airflow service that supports several environments, persona of users, and use cases. Namely, we’ll discuss principles to keep in mind when balancing centralized control over the data platform with decentralized teams using Airflow in a way that they’ll need. This will include things around observability, developer productivity, security, and infrastructure. We’ll also talk about day 2 concerns around overheard, infrastructure maintenance, and other tradeoffs to consider.

Rearange Dag details view PR Presentation

· Airflow Monthly Virtual Town Hall- March

talk

Brent Bovenzi (Staff Frontend Engineer at Astronomer)

Speakers from Astronomer