talk-data.com

Topic

Airflow

Apache Airflow

workflow_management data_orchestration etl

Activities

tagged

Activity Trend

157 peak/qtr

2020-Q1 2026-Q1

Top Events

Airflow Summit 2025 139 Data Engineering Podcast 122 Airflow Summit 2024 92 Airflow Summit 2023 81 Airflow Summit 2022 52 Airflow Summit 2021 52 Airflow Summit 2020 39 O'Reilly Data Engineering Books 11 DATA MINER Big Data Europe Conference 2020 5 dbt Coalesce 2022 5 Airflow Monthly Virtual Town Hall- August 4 Airflow Monthly Virtual Town Hall- March 4

Top Speakers

Tobias Macey 122 Jarek Potiuk (Apache Software Foundation) 15 Kaxil Naik 12 Ash Berlin-Taylor (Astronomer) 11 Rafal Biegacz 10 Vikram Koka (Astronomer) 9 John Jackson 9 Brent Bovenzi (Astronomer) 7 Amogh Rajesh Desai 7 Maxime Beauchemin (Preset) 7 Tatiana Al-Chueyr Martins (Astronomer) 6 Jens Scheffler 6

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Airflow Summit 2023 ×

Demystifying Apache Airflow: Separating facts from fiction

2023-07-01 · Airflow Summit 2023

session

by Uma Ramadoss , Shubham Mehta (AWS Analytics)

Dagster Prefect

Apache Airflow is a popular workflow platform, but it often faces critiques that may not paint the whole picture. In this talk, we will unpack the critiques of Apache Airflow and provide a balanced analysis. We will highlight the areas where these critiques correctly point out Airflow’s weaknesses, debunk common myths, and showcase where competitors like Dagster and Prefect are excelling. By understanding the pros and cons of Apache Airflow, attendees will be better equipped to make informed decisions about whether Airflow is the right choice for their use cases. This talk will provide a comprehensive and objective assessment of Apache Airflow and its place in the workflow management ecosystem. Notes: What Critics Get Right About Airflow’s Weaknesses Debunking Myths and Misconceptions About Airflow Competitor Analysis Real-World Use Cases: When Airflow Shines Making Informed Decisions: Choosing the Right Workflow Platform

Eat, Sleep, Test, Repeat: How King ensures always-on data

2023-07-01 · Airflow Summit 2023

session

by Nathan Hadfield

At King, data is fundamental in helping us deliver the best possible experiences for the players of our games while continually bringing them new, innovative and evolving gameplay features. Data has to be “always-on”, where downtime and accuracy is treated with the same level of diligence as any of our games and success is measured against internal SLAs. How is King using ‘data reliability engineering as code’ tools such as SodaCore within Airflow pipelines to detect, diagnose and inform about data issues to create coverage, improve quality & accuracy and help eliminate data downtime?

Elevating Data Quality: Great Expectations and Airflow at PepsiCo

2023-07-01 · Airflow Summit 2023

session

by Russell Lamb

Cloud Computing Data Engineering Data Quality

Discover PepsiCo’s dynamic data quality strategy in a multi-cloud landscape. Join me, the Director of Data Engineering, as I unveil our Airflow utilization, custom operator integration, and the power of Great Expectations. Learn how we’ve harmonized Data Mesh into our decentralized development for seamless data integration. Explore our journey to maintain quality and enhance data as a strategic asset at PepsiCo.

Empowering Collaborative Data Workflows with Airflow and Cloud Services

2023-07-01 · Airflow Summit 2023

session

by Hoa Nguyen , Stanisław Smyl

Cloud Computing dbt Kubernetes Python YAML

Productive cross-team collaboration between data engineers and analysts is the goal of all data teams, however, fulfilling on that mission can be challenging given the diverse set of skills that each group brings. In this talk we present an example of how one team tackled this topic by creating a flexible, dynamic and extensible framework using Airflow and cloud services that allowed engineers and analysts to jointly create data-centric micro-services to serve up projections and other robust analysis for use in the organization. The framework, which utilized dynamic DAG generation configured using yaml files, Kubernetes jobs and dbt transformations, abstracted away many of the details associated with workflow orchestration, allowing analysts to focus on their Python or R code and data processing logic while enabling data engineers to monitor the pipelines and ensure their scalability.

Enabling Data Mesh by Moving from a Monolithic Airflow to Several Smaller Environments

2023-07-01 · Airflow Summit 2023

session

by Filip Kunčar , Stanislav Repka

Astronomer GCP Cyber Security

Kiwi.com started using Airflow in June 2016 as an orchestrator for several people in the company. The need for the tool grew and the monolithic instance was used by 30+ teams having 500+ DAGs active resulting in 3.5 million tasks/month successfully finished. At first, we moved to using a monolithic Airflow environment, but our needs quickly changed as we wanted to support a data mesh architecture within kiwi.com. By leveraging Astronomer on GCP, we were able to move from a monolithic Airflow environment to many smaller instances of Airflow. This talk will go into how to handle things like DAG dependencies, observability, and stakeholder management. Furthermore, we’ll talk about security, particularly how GCP’s workload identity helped us achieve a passwordless Airflow experience.

Event-based DAG Parsing: No more F5ing in the UI

2023-07-01 · Airflow Summit 2023

session

by Bas Harenslak (Astronomer)

Have you ever added a DAG file and had no clue what happened to it? You’re not alone! With default settings, Airflow can wait up to 5 minutes before processing new DAG files. In this talk, I’ll discuss the implementation of an event-based DAG parser that immediately processes changes in the DAGs folder. As a result, changes are reflected immediately in the Airflow UI. In this talk I will cover: A demonstration of the event-based DAG parser and the fast Airflow UI experience How the current DAG parser implementation and configuration works How an event-based DAG parser is implemented

Flexible DAG Trigger Forms (AIP-50)

2023-07-01 · Airflow Summit 2023

session

by Christian Schilling , Jens Scheffler

Azure GitHub Jenkins JSON

As user of Airflow we often use DagRun.conf attributes to control content and flow of a DAG run. Previously the Airflow UI only allowed to launch via JSON in the UI. This was technically feasible but not user friendly. A user needs to model, check and understand the JSON and enter parameters manually without the option to validate before trigger. Similar like Jenkins or Github/Azure pipelines we desire an UI option to trigger with a UI and specifying parameters. With Airflow 2.6.0 now the DAG.params are used to render a nice entry form and with a bit of options a user friendly trigger UI can be implemented. This session is showing how the new feature works and provides some examples how to use it for your purposes.

Forging the Future: Five years of fabricating with Airflow

2023-07-01 · Airflow Summit 2023

session

by Madison Swain-Bowden (Automattic)

C#/.NET Python

As a data engineer, I’ve used Airflow extensively over the last 5 years: across 3 jobs, several different roles; for side projects, for critical infrastructure; for manually triggered jobs, for automated workflows; for IT (Ookla/Speedtest.net), for science (Allen Institute for Cell Science), for the commons (Openverse), for liberation (Orca Collective). Authoring a DAG has changed dramatically since 2018, thanks to improvements in both Airflow and the Python language. In this session, we’ll take a trip back in time to see how DAGs looked several years ago, and what the same DAGs might look like now. We’ll appreciate the many improvements that have been made towards simplifying workflow construction. I’ll also discuss the significant advancements that have been made around deploying Airflow. Lastly, I’ll give a brief overview of different use cases and ways I’ve seen Airflow leveraged.

From Bug To Bug Fix: Tips of how to submit an issue

2023-07-01 · Airflow Summit 2023

session

by Diana Vazquez Romo

How to submit an Issue for community to fix To ensure a quality product the Airflow community relies on bug reports from Airflow users. Often times bug reports are incomplete or fail to include steps for observed bug to be re-created. This workshop will present an example of bug-to-issue process, namely, how to rule out non-Airflow issues and once an Airflow issue is suspected, how to submit an issue for community to see. This could also cover how the community picks up on an issue and eventually fixes it in a future release. Issue reporting is key to improving Airflow and the community will benefit on an easily digestible guide of how best to do so.

From Pain Points to Best Practices: Enhancing Airflow migrations and local development at Wix.com

2023-07-01 · Airflow Summit 2023

session

by Roy Noyman

Data Engineering

Are you tired of spending hours on Airflow migrations and wondering how to make them more accessible? Would you like to be able to test your code on different Airflow versions? or are you struggling to set up a reliable local development environment? These are some of the top pain points for data engineers working with Airflow. But fear not – Wix Data Engineering has some best practices to share that will make your life easier. What will the audience learn: How does Wix Data Engineering make Airflow migrations easier and less painful. How to ensure DEs code is forward-compatible with the latest Airflow version. How to test code on different Airflow versions How to maintain a stable local environment for DEs while speeding up their dev velocity. Some more must-know framework team’s best practices.

Future of the Airflow UI

2023-07-01 · Airflow Summit 2023

session

by Brent Bovenzi (Astronomer)

We are continuing to modernize the Airflow UI to make it easier to manage all aspects of your DAGs. See a demo of the latest updates and improve your workflows with new tips and tricks. Then get a preview of what else will be coming soon. Followed up by Q&A for people to field their own use-cases and explore new ideas on how to improve the user experience.

Guided Tour to DAG Authoring

2023-07-01 · Airflow Summit 2023

session

by Jed Cunningham

New to Airflow or haven’t followed any of the recent DAG authoring enhancements? This talk is for you! We will go through various DAG authoring features like Setup/Teardown tasks (~2.7), Datasets (2.4), Dynamic Tasks (2.3) and Async tasks (2.2). You won’t be an expert after this short talk, however, you’ll have a head start when you write your next DAG, no hacks required.

How to use Data Contracts for Data Quality in your Airflow Ecosystem

2023-07-01 · Airflow Summit 2023

session

by Shirshanka Das (Acryl Data)

Data Contracts Data Quality

Data contracts have been much discussed in the community of late, with a lot of curiosity around how to approach this concept in practice. We believe data contracts need a harmonizing layer to manage data quality in a uniform manner across a fragmented stack. We are calling this harmonizing layer the Control Plane for Data - powered by the common thread across these systems: metadata. For teams already orchestrating pipelines with Airflow, data contacts can be an effective way to process data that meets preset quality standards. With a control plane as a connecting layer, producers can build data contracts that consumers can rely on, ensuring DAGs only run when a contract is valid. Producers can govern how workflows should behave, and consumers receive the tooling they need to only opt into high quality data. Learn how to use data contracts and DataHub to make your Airflow pipelines more reliable - as well as other use cases that can help build a simpler, more flexible data stack.

Introducing airflowctl: A CLI to streamline getting started with Airflow

2023-07-01 · Airflow Summit 2023

session

by Kaxil Naik

Docker Python

New users starting with Airflow frequently encounter several challenges, ranging from the complexities of Containers and virtual environments to the Python dependency hell. Moreover, their familiarity with tools such as Docker, docker-compose, and Helm might be somewhat limited and even overkill. In contrast, seasoned Airflow users encounter their problems, encompassing configuration conflics with ongoing Airflow projects and intricacies stemming from Docker and docker-compose configurations and lack of visibility into all the projects. With airflowctl, users can install & setup Airflow using a single command. For existing users, they can use it to manage multiple Airflow projects with different Airflow versions on the same machine. This allows creating & debugging DAGs in an IDE seamlessly. Agenda for the call: Why airflowctl? Goal Current functionality & Demo Vision / Roadmap

Lightning talks and event wrap-up

2023-07-01 · Airflow Summit 2023

session

We will have close Airflow Summit with lightning talks (5 minutes each). You will be able to sign up during the event. We will only have space for 10 talks.

Manifest destiny: Orchestrating dbt using Airflow

2023-07-01 · Airflow Summit 2023

session

by Jonathan Talmi (Snapcommerce)

Cosmos dbt

Airflow is a popular choice for organizations looking to integrate open-source dbt within their existing data infrastructure. This talk will explore two primary methods of running dbt in Airflow: job-level and model-level. We’ll discuss the tradeoffs associated with each approach, highlighting the simplicity and efficiency of job-level orchestration, contrasted with the enhanced observability and control provided by model-level orchestration. We’ll also explain how the balance has shifted in recent years, with improvements to dbt core making model-level more efficient and innovative Airflow extensions like Cosmos making it easier to implement. Finally, we’ll provide benchmarks to help you determine which paradigm is the best fit for your needs.

Mastering Dependencies: The Airflow way

2023-07-01 · Airflow Summit 2023

session

by Jarek Potiuk (Apache Software Foundation)

CI/CD Python

Apache Airflow has over 650 Python dependencies. In case you did not know already, dependencies in Python are difficult subject. And Airflow has its own, custom ways of managing the dependencies. Airflow has a rather complex system to manage dependencies in their CI system, but this talk is not about it. This talk is directed to the users of Airflow who want to keep their dependencies updated, describing ways they can do it. This presentation will explain how to effectively manage and handle custom dependencies in Airflow. Jarek will guide you through practical solutions and best practices to make your Airflow experience with dependencies - yes you guessed it - a breeze.

Micropipelines: A microservice approach for DAG authoring using datasets

2023-07-01 · Airflow Summit 2023

session

by Vikram Koka (Astronomer)

Snowflake Spark

Introduced in Airflow 2.4, Datasets are a foundational feature for authoring modular data pipelines. As DAGs grow to encompass a larger number of data sources and encompass multiple data transformation steps, they typically become less predictable in the timeliness of execution and less efficient. This talk focuses on leveraging Datasets to enable predictable and more efficient DAGs, by leveraging patterns from microservice architectures. Just as large monolithic applications were decomposed into micro-services to deliver more efficient scalability and faster development cycles, micropipelines have the same potential to radically transform data pipeline efficiency and development velocity. Using a simple financial analysis pipeline example, with data aggregation being done in Snowflake and prediction analysis in Spark, this talk outlines how to retain timelines of data pipelines while expanding data sets.

Migrate Apache Oozie Workflows to Airflow and Run with Amazon EMR

2023-07-01 · Airflow Summit 2023

session

by Dipankar Ghosal

Amazon EMR GitHub

Learn how to convert Oozie Workflows into Airflow DAG and run it on Amazon EMR. The utility supports Airflow 2.4.3. This utility is built on top of https://github.com/GoogleCloudPlatform/oozie-to-airflow

Migrating from Enterprise Scheduler like Autosys, TIDAL, Stonebranch to Airflow

2023-07-01 · Airflow Summit 2023

session

by Ramajayam Gopithirumal

DevOps

How we migrated from Autosys with 1000s of jobs with 800+ dependencies with SLA to be met every hour in a Canada Prominent Bank. Use case to migrate from enterprise scheduler $ spent for every license and renewal cost SLA,Monitoring,Auditing,Devops Integration Vendor lockin 4.Integration to multiple providers

Page 3 of 5

← Previous

1 2 3 4 5