In legacy Airflow 2.x, each DAG run was tied to a unique “execution_date.” By removing this requirement, Airflow can now directly support a variety of new use cases, such as model training and generative AI inference, without the need for hacks and workarounds typically used by machine learning and AI engineers. In this talk, we will delve into the significant advancements in Airflow 3 that enable GenAI and MLOps use cases, particularly through the changes outlined in AIP 83. We’ll cover key changes like the renaming of “execution_date” to “logical_date,” along with the allowance for it to be null, and the introduction of the new “run_after” field which provides a more meaningful mechanism for scheduling and sorting. Furthermore, we’ll discuss how Airflow 3 enables multiple parallel runs, empowering diverse triggering mechanisms and easing backfill logic with a real-world demo.
talk-data.com
Speaker
Ankit Chaurasia
3
talks
Filter by Event / Source
Talks & appearances
3 activities · Newest first
Airflow 3 has officially arrived! In this session, we’ll start by discussing prerequisites for a smooth upgrade from Airflow 2.x to Airflow 3, including airflow version requirements, removing deprecated SubDAGs, and backing up and cleaning your metadata database prior to migration. We’ll then explore the new CLI utility: airflow config update [—-fix] for auto-applying configuration changes. We’ll demo cleaning old XCom data to speed up schema migration. During this session, attendees will learn to verify and adapt their pipelines for Airflow 3 using a Ruff-based upgrade utility. I will demo run ruff check dag/ –select AIR301 to surface scheduling issues, inspect fixes via ruff check dag/ –select AIR301 –show-fixes, and apply corrections with ruff check dag/ –select AIR301 –fix. We’ll also examine rules AIR302 for deprecated config and AIR303 for provider package migrations. By the end, your DAGs will pass all AIR3xx checks error-free. Join this session for live demos and practical examples that will empower you to confidently upgrade, minimise downtime, and achieve optimal performance in Airflow 3.
Are you looking to harness the full potential of data-driven pipelines with Apache Airflow? This session will dive into the newly introduced conditional expressions for advanced dataset scheduling in Airflow - a feature highly requested by the Airflow community. Attendees will learn how to effectively use logical operators to create complex dependencies that trigger DAGs based on the dataset updates in real-world scenarios. We’ll also explore the innovative DatasetOrTimeSchedule, which combines time-based and dataset-triggered scheduling for unparalleled flexibility. Furthermore, attendees will discover the latest API endpoints that facilitate external updates and resets of dataset events, streamlining workflow management across different deployments. This talk also aims to explain: The basics of using conditional expressions for dataset scheduling. How do we integrate time-based schedules with dataset triggers? Practical applications of the new API endpoints for enhanced dataset management. Real-world examples of how these features can optimize your data workflows.