talk-data.com

Topic

Airflow

Apache Airflow

workflow_management data_orchestration etl

Activities

682

tagged

Activity Trend

157 peak/qtr

2020-Q1 2026-Q1

Top Events

Airflow Summit 2025 139 Data Engineering Podcast 122 Airflow Summit 2024 92 Airflow Summit 2023 81 Airflow Summit 2022 52 Airflow Summit 2021 52 Airflow Summit 2020 39 O'Reilly Data Engineering Books 11 DATA MINER Big Data Europe Conference 2020 5 dbt Coalesce 2022 5 Airflow Monthly Virtual Town Hall- August 4 Airflow Monthly Virtual Town Hall- March 4

Top Speakers

Tobias Macey 122 Jarek Potiuk (Apache Software Foundation) 15 Kaxil Naik 12 Ash Berlin-Taylor (Astronomer) 11 Rafal Biegacz 10 Vikram Koka (Astronomer) 9 John Jackson 9 Brent Bovenzi (Astronomer) 7 Amogh Rajesh Desai 7 Maxime Beauchemin (Preset) 7 Tatiana Al-Chueyr Martins (Astronomer) 6 Jens Scheffler 6

Activities

682 activities · Newest first

All Video Podcast Book

Enabling Data Mesh by Moving from a Monolithic Airflow to Several Smaller Environments

2023-07-01 · Airflow Summit 2023

session

by Filip Kunčar , Stanislav Repka

Astronomer GCP Cyber Security

Kiwi.com started using Airflow in June 2016 as an orchestrator for several people in the company. The need for the tool grew and the monolithic instance was used by 30+ teams having 500+ DAGs active resulting in 3.5 million tasks/month successfully finished. At first, we moved to using a monolithic Airflow environment, but our needs quickly changed as we wanted to support a data mesh architecture within kiwi.com. By leveraging Astronomer on GCP, we were able to move from a monolithic Airflow environment to many smaller instances of Airflow. This talk will go into how to handle things like DAG dependencies, observability, and stakeholder management. Furthermore, we’ll talk about security, particularly how GCP’s workload identity helped us achieve a passwordless Airflow experience.

Event-based DAG Parsing: No more F5ing in the UI

2023-07-01 · Airflow Summit 2023

session

by Bas Harenslak (Astronomer)

Have you ever added a DAG file and had no clue what happened to it? You’re not alone! With default settings, Airflow can wait up to 5 minutes before processing new DAG files. In this talk, I’ll discuss the implementation of an event-based DAG parser that immediately processes changes in the DAGs folder. As a result, changes are reflected immediately in the Airflow UI. In this talk I will cover: A demonstration of the event-based DAG parser and the fast Airflow UI experience How the current DAG parser implementation and configuration works How an event-based DAG parser is implemented

Flexible DAG Trigger Forms (AIP-50)

2023-07-01 · Airflow Summit 2023

session

by Christian Schilling , Jens Scheffler

Azure GitHub Jenkins JSON

As user of Airflow we often use DagRun.conf attributes to control content and flow of a DAG run. Previously the Airflow UI only allowed to launch via JSON in the UI. This was technically feasible but not user friendly. A user needs to model, check and understand the JSON and enter parameters manually without the option to validate before trigger. Similar like Jenkins or Github/Azure pipelines we desire an UI option to trigger with a UI and specifying parameters. With Airflow 2.6.0 now the DAG.params are used to render a nice entry form and with a bit of options a user friendly trigger UI can be implemented. This session is showing how the new feature works and provides some examples how to use it for your purposes.

Forging the Future: Five years of fabricating with Airflow

2023-07-01 · Airflow Summit 2023

session

by Madison Swain-Bowden (Automattic)

C#/.NET Python

As a data engineer, I’ve used Airflow extensively over the last 5 years: across 3 jobs, several different roles; for side projects, for critical infrastructure; for manually triggered jobs, for automated workflows; for IT (Ookla/Speedtest.net), for science (Allen Institute for Cell Science), for the commons (Openverse), for liberation (Orca Collective). Authoring a DAG has changed dramatically since 2018, thanks to improvements in both Airflow and the Python language. In this session, we’ll take a trip back in time to see how DAGs looked several years ago, and what the same DAGs might look like now. We’ll appreciate the many improvements that have been made towards simplifying workflow construction. I’ll also discuss the significant advancements that have been made around deploying Airflow. Lastly, I’ll give a brief overview of different use cases and ways I’ve seen Airflow leveraged.

From Bug To Bug Fix: Tips of how to submit an issue

2023-07-01 · Airflow Summit 2023

session

by Diana Vazquez Romo

How to submit an Issue for community to fix To ensure a quality product the Airflow community relies on bug reports from Airflow users. Often times bug reports are incomplete or fail to include steps for observed bug to be re-created. This workshop will present an example of bug-to-issue process, namely, how to rule out non-Airflow issues and once an Airflow issue is suspected, how to submit an issue for community to see. This could also cover how the community picks up on an issue and eventually fixes it in a future release. Issue reporting is key to improving Airflow and the community will benefit on an easily digestible guide of how best to do so.

From Pain Points to Best Practices: Enhancing Airflow migrations and local development at Wix.com

2023-07-01 · Airflow Summit 2023

session

by Roy Noyman

Data Engineering

Are you tired of spending hours on Airflow migrations and wondering how to make them more accessible? Would you like to be able to test your code on different Airflow versions? or are you struggling to set up a reliable local development environment? These are some of the top pain points for data engineers working with Airflow. But fear not – Wix Data Engineering has some best practices to share that will make your life easier. What will the audience learn: How does Wix Data Engineering make Airflow migrations easier and less painful. How to ensure DEs code is forward-compatible with the latest Airflow version. How to test code on different Airflow versions How to maintain a stable local environment for DEs while speeding up their dev velocity. Some more must-know framework team’s best practices.

Future of the Airflow UI

2023-07-01 · Airflow Summit 2023

session

by Brent Bovenzi (Astronomer)

We are continuing to modernize the Airflow UI to make it easier to manage all aspects of your DAGs. See a demo of the latest updates and improve your workflows with new tips and tricks. Then get a preview of what else will be coming soon. Followed up by Q&A for people to field their own use-cases and explore new ideas on how to improve the user experience.

Guided Tour to DAG Authoring

2023-07-01 · Airflow Summit 2023

session

by Jed Cunningham

New to Airflow or haven’t followed any of the recent DAG authoring enhancements? This talk is for you! We will go through various DAG authoring features like Setup/Teardown tasks (~2.7), Datasets (2.4), Dynamic Tasks (2.3) and Async tasks (2.2). You won’t be an expert after this short talk, however, you’ll have a head start when you write your next DAG, no hacks required.

How to use Data Contracts for Data Quality in your Airflow Ecosystem

2023-07-01 · Airflow Summit 2023

session

by Shirshanka Das (Acryl Data)

Data Contracts Data Quality

Data contracts have been much discussed in the community of late, with a lot of curiosity around how to approach this concept in practice. We believe data contracts need a harmonizing layer to manage data quality in a uniform manner across a fragmented stack. We are calling this harmonizing layer the Control Plane for Data - powered by the common thread across these systems: metadata. For teams already orchestrating pipelines with Airflow, data contacts can be an effective way to process data that meets preset quality standards. With a control plane as a connecting layer, producers can build data contracts that consumers can rely on, ensuring DAGs only run when a contract is valid. Producers can govern how workflows should behave, and consumers receive the tooling they need to only opt into high quality data. Learn how to use data contracts and DataHub to make your Airflow pipelines more reliable - as well as other use cases that can help build a simpler, more flexible data stack.

Introducing airflowctl: A CLI to streamline getting started with Airflow

2023-07-01 · Airflow Summit 2023

session

by Kaxil Naik

Docker Python

New users starting with Airflow frequently encounter several challenges, ranging from the complexities of Containers and virtual environments to the Python dependency hell. Moreover, their familiarity with tools such as Docker, docker-compose, and Helm might be somewhat limited and even overkill. In contrast, seasoned Airflow users encounter their problems, encompassing configuration conflics with ongoing Airflow projects and intricacies stemming from Docker and docker-compose configurations and lack of visibility into all the projects. With airflowctl, users can install & setup Airflow using a single command. For existing users, they can use it to manage multiple Airflow projects with different Airflow versions on the same machine. This allows creating & debugging DAGs in an IDE seamlessly. Agenda for the call: Why airflowctl? Goal Current functionality & Demo Vision / Roadmap

Lightning talks and event wrap-up

2023-07-01 · Airflow Summit 2023

session

We will have close Airflow Summit with lightning talks (5 minutes each). You will be able to sign up during the event. We will only have space for 10 talks.

Manifest destiny: Orchestrating dbt using Airflow

2023-07-01 · Airflow Summit 2023

session

by Jonathan Talmi (Snapcommerce)

Cosmos dbt

Airflow is a popular choice for organizations looking to integrate open-source dbt within their existing data infrastructure. This talk will explore two primary methods of running dbt in Airflow: job-level and model-level. We’ll discuss the tradeoffs associated with each approach, highlighting the simplicity and efficiency of job-level orchestration, contrasted with the enhanced observability and control provided by model-level orchestration. We’ll also explain how the balance has shifted in recent years, with improvements to dbt core making model-level more efficient and innovative Airflow extensions like Cosmos making it easier to implement. Finally, we’ll provide benchmarks to help you determine which paradigm is the best fit for your needs.

Mastering Dependencies: The Airflow way

2023-07-01 · Airflow Summit 2023

session

by Jarek Potiuk (Apache Software Foundation)

CI/CD Python

Apache Airflow has over 650 Python dependencies. In case you did not know already, dependencies in Python are difficult subject. And Airflow has its own, custom ways of managing the dependencies. Airflow has a rather complex system to manage dependencies in their CI system, but this talk is not about it. This talk is directed to the users of Airflow who want to keep their dependencies updated, describing ways they can do it. This presentation will explain how to effectively manage and handle custom dependencies in Airflow. Jarek will guide you through practical solutions and best practices to make your Airflow experience with dependencies - yes you guessed it - a breeze.

Micropipelines: A microservice approach for DAG authoring using datasets

2023-07-01 · Airflow Summit 2023

session

by Vikram Koka (Astronomer)

Snowflake Spark

Introduced in Airflow 2.4, Datasets are a foundational feature for authoring modular data pipelines. As DAGs grow to encompass a larger number of data sources and encompass multiple data transformation steps, they typically become less predictable in the timeliness of execution and less efficient. This talk focuses on leveraging Datasets to enable predictable and more efficient DAGs, by leveraging patterns from microservice architectures. Just as large monolithic applications were decomposed into micro-services to deliver more efficient scalability and faster development cycles, micropipelines have the same potential to radically transform data pipeline efficiency and development velocity. Using a simple financial analysis pipeline example, with data aggregation being done in Snowflake and prediction analysis in Spark, this talk outlines how to retain timelines of data pipelines while expanding data sets.

Migrate Apache Oozie Workflows to Airflow and Run with Amazon EMR

2023-07-01 · Airflow Summit 2023

session

by Dipankar Ghosal

Amazon EMR GitHub

Learn how to convert Oozie Workflows into Airflow DAG and run it on Amazon EMR. The utility supports Airflow 2.4.3. This utility is built on top of https://github.com/GoogleCloudPlatform/oozie-to-airflow

Migrating from Enterprise Scheduler like Autosys, TIDAL, Stonebranch to Airflow

2023-07-01 · Airflow Summit 2023

session

by Ramajayam Gopithirumal

DevOps

How we migrated from Autosys with 1000s of jobs with 800+ dependencies with SLA to be met every hour in a Canada Prominent Bank. Use case to migrate from enterprise scheduler $ spent for every license and renewal cost SLA,Monitoring,Auditing,Devops Integration Vendor lockin 4.Integration to multiple providers

Multi-tenancy State of the Union

2023-07-01 · Airflow Summit 2023

session

by Jarek Potiuk (Apache Software Foundation) , Vincent Beck

This sesion is about the current state of implementation for multi-tenancy feature of Airflow. This is a long-term feature that involves multiple changes, separate AIPs to implement, with the long-term vision of having single Airflow instance supporting multiple, independed teams using it - either from the same company or as part of Airflow-As-A-Service implementation.

My Journey to Committer Status: What I learned and how it can help you

2023-07-01 · Airflow Summit 2023

session

by Niko Oliveira (Amazon | Apache Airflow Comitter)

GitHub

Apache Airflow is one of the largest Apache projects by many metrics but it ranks particularly high in the number of contributors involved in the project. This leads to hundreds of Github Issues, Pull Requests and Discussions being submitted to the project every month. So it is critical to have an ample number of Committers to support the community. In this talk I will summarize my personal experience working towards, and ultimately achieving, committer status in Apache Airflow. I’ll cover the lessons I learned along the way as well as provide some advice and best practices to help others achieve committer status themselves.

OpenLineage in Airflow: A Comprehensive Guide

2023-07-01 · Airflow Summit 2023

session

by Maciej Obuchowski (Datadog)

With native support for OpenLineage in Airflow, users can now observe and manage their data pipelines with ease. This talk will cover the benefits of using OpenLineage, how it is implemented in Airflow, practical examples of how to take advantage of it, and what’s in our roadmap. Whether you’re an Airflow user or provider maintainer, this session will give you the knowledge to make the most of this tool.

Operators Need to Die

2023-07-01 · Airflow Summit 2023

session

by Bolke de Bruin

Operators form the core of the language of Airflow. In this talk I will argue that while they have served their purpose, they are holding back the development of Airflow and if Airflow wants to stay relevant in the world of the ’new’ data stack (hint: it isn’t currently considered to be part of it) self-service data mesh it needs to kill its darling.

Page 19 of 35

← Previous

1 ... 17 18 19 20 21 ... 35