talk-data.com

Topic

Airflow

Apache Airflow

workflow_management data_orchestration etl

Activities

tagged

Activity Trend

157 peak/qtr

2020-Q1 2026-Q1

Top Events

Airflow Summit 2025 139 Data Engineering Podcast 122 Airflow Summit 2024 92 Airflow Summit 2023 81 Airflow Summit 2022 52 Airflow Summit 2021 52 Airflow Summit 2020 39 O'Reilly Data Engineering Books 11 DATA MINER Big Data Europe Conference 2020 5 dbt Coalesce 2022 5 Airflow Monthly Virtual Town Hall- August 4 Airflow Monthly Virtual Town Hall- March 4

Top Speakers

Tobias Macey 122 Jarek Potiuk (Apache Software Foundation) 15 Kaxil Naik 12 Ash Berlin-Taylor (Astronomer) 11 Rafal Biegacz 10 Vikram Koka (Astronomer) 9 John Jackson 9 Brent Bovenzi (Astronomer) 7 Amogh Rajesh Desai 7 Maxime Beauchemin (Preset) 7 Tatiana Al-Chueyr Martins (Astronomer) 6 Jens Scheffler 6

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Airflow Summit 2021 ×

Airflow and Analytics Engineering - Dos and don'ts

2021-07-01 · Airflow Summit 2021

session

by Sergio Camilo Fandiño Hernández

Analytics Analytics Engineering

Considering that the role of Analytics Engineering has emerged in the last few years within data and analytics teams, it is important for me to highlight what role an Analytics engineer has and how the Dos and Don’ts from my perspective can contribute to a team and boost their day-to-day work with the help of Airflow.

Airflow as the Foundation of a Multi-Faceted Data Platform

2021-07-01 · Airflow Summit 2021

session

by Jay Sen , Ry Walker (Astronomer)

Astronomer

A discussion with Jay Sen, Data Platform Architect at Paypal, and Ry Walker, Founder/CTO of Astronomer about the central role Airflow plays within Paypal’s data platform, and the opportunity to build stronger integrations between Airflow and other tools that surround it.

Airflow Extensions for Streamlined ETL Backfilling

2021-07-01 · Airflow Summit 2021

session

by Ravi Autar

AI/ML ETL/ELT

Using Airflow as our scheduling framework, we ETL data generated by tens of millions of transactions every day to build the backbone for our reports, dashboards, and training data for our machine learning models. There are over 500 (and growing) such ingested and aggregated tables owned by multiple teams that contain intricate dependencies between one another. Given this level of complexity, it can become extremely cumbersome to coordinate backfills for any given table, when also taking into account all its downstream dependencies, aggregation intervals, and data availability. This talk will focus on how we customized and extended Airflow at Adyen to streamline our backfilling operations. This allows us to prevent mistakes and enable our product teams to keep launching fast and iterating.

Airflow Journey @SG

2021-07-01 · Airflow Summit 2021

session

by Alaeddine Maaoui , Ahmed Chakir Alaoui

This talk will cover the adoption journey (Technical Challenges & Team Organization) of Apache Airflow (1.8 to 2.0) at Societe Generale. Time line of events: POC with v1.8 to convince our management. Shared infrastructure with v1.10.2. Multiple Infrastructure with v1.10.12. On demand service offer with v2.0 (Challenges & REX)

Airflow loves Kubernetes

2021-07-01 · Airflow Summit 2021

session

by Jarek Potiuk (Apache Software Foundation) , Kaxil Naik

Docker Kubernetes

In this talk Jarek and Kaxil will talk about official, community support for running Airflow in the Kubernetes environment. The full support for Kubernetes deployments was developed by the community for quite a while and in the past users of Airflow had to rely on 3rd-party images and helm-charts to run Airflow on Kubernetes. Over the last year community members made an enormous effort to provide robust, simple and versatile support for those deployments that would respond to all kinds of Airflow users. Starting from official container image, through quick-start docker-compose configuration, culminating in April with release of the official Helm Chart for Airflow. This talk is aimed for Airflow users who would like to make use of all the effort. The users will learn how to: Extend or customize Airflow Official Docker Image to adapt it to their needs Run quickstart docker-compose environment where they can quickly verify their images Configure and deploy Airflow on Kubernetes using the Official Airflow Helm chart

Airflow: The Power of Stitching Services Together

2021-07-01 · Airflow Summit 2021

session

by Rafal Biegacz , Filip Knapik

AWS Glue Cloud Computing Cloud Composer

Apache Airflow is known to be a great orchestration tool that enables use cases that would not be possible otherwise. One of the great features that Airflow has is the possibility to “glue” together totally separate services to establish bigger functionalities. In this talk you will learn about various Airflow usages that let Airflow users to automate their critical company processes and even establish businesses. The examples provided will be based on Airflow used in the context of Cloud Composer which is a managed service to provision and manage Airflow instances.

An On-Demand Airflow Service for Internet Scale Gameplay Pipelines

2021-07-01 · Airflow Summit 2021

session

by Nitish Victor , Yuanmeng Zeng

ETL/ELT

EA Games have very dynamic and federated needs on their data processing pipelines. Many individual studios within EA build and manage the data pipelines for their games iterating rapidly through game development cycles. Developer productivity around orchestrating these pipelines is as critical as providing a robust production quality orchestration service. With these in mind, we re-engineered our Airflow service ground up to cater to our large internal user base (1000s) and internet scale data processing systems (Petabytes of data). This session details the evolution of the use of Airflow at EA Digital Platform from a monolithic multi-tenant instance to an “On-Demand” system where teams and studios create their own dedicated Airflow instance with all the necessary bells-and-whistles required at the click of a button - and allows them to immediately get their data pipelines running. We also elaborate how Airflow is interwoven into a “Self Serve” model for ETL pipelines within our teams with the objective of truely democratizing data across our games.

Apache Airflow 2.0 on Amazon MWAA

2021-07-01 · Airflow Summit 2021

session

by John Jackson , Sam Dengler

In this session we will discuss Amazon Managed Workflows for Apache Airflow (MWAA), how Apache Airflow (and specifically version 2.0) is implemented in the service, best practices for deployment and operations, and the Amazon MWAA team’s commitment to open source usage and contributions.

Apache Airflow and Ray: Orchestrating ML at Scale

2021-07-01 · Airflow Summit 2021

session

by Daniel Imberman

AI/ML Pandas TensorFlow

As the Apache Airflow project grows, we seek both ways to incorporate rising technologies and novel ways to expose them to our users. Ray is one of the fastest-growing distributed computation systems on the market today. In this talk, we will introduce the Ray decorator and Ray backend. These features, built with the help of the Ray maintainers at Anyscale, will allow Data Scientists to natively integrate their distributed pandas, XGBoost, and TensorFlow jobs to their airflow pipelines with a single decorator. By merging the orchestration of Airflow and the distributed computation of Ray, this coordination of technologies opens Airflow users to a whole host of new possibilities when designing their pipelines.

Apache Airflow at Apple - Multi-tenant Airflow and Custom Operators

2021-07-01 · Airflow Summit 2021

session

by Roberto Santamaria , Howie Wang

Running a platform where different business units at Apple can run their workloads in isolation and share operators.

Apache Airflow at Wise

2021-07-01 · Airflow Summit 2021

session

by Alexandra Abbas

AI/ML

Wise (previously TransferWise) is a London-based fin-tech company. We build a better way of sending money internationally. At Wise we make great use of Airflow. More than 100 data scientists, analysts and engineers use Airflow every day to generate reports, prepare data, (re)train machine learning models and monitor services. My name is Alexandra, I’m a Machine Learning Engineer at Wise. Our team is responsible for building and maintaining Wise’s Airflow instances. In this presentation I would like to talk about three main things, our current setup, our challenges and our future plans with Airflow. We are currently transitioning from a single centralised Airflow instance into many segregated instances to increase reliability and limit access. We’ve learned a lot throughout this journey and looking to share these learnings with a wider audience.

Autoscaling in Airflow - Lessons learned

2021-07-01 · Airflow Summit 2021

session

by Anita Fronczak

Cloud Computing Cloud Composer Kubernetes

Autoscaling in Airflow - what we learnt based on Cloud Composer case. We would like to present how we approach the autoscaling problem for Airflow running in Kubernetes in Cloud Composer: how we calculate our autoscaling metric, what problem we had for scaling down and how did we solve it. Also we share an ideas on what and how we could improve the current solution

Building an Elastic Platform Using Airflow Uniquely as an Orchestrator

2021-07-01 · Airflow Summit 2021

session

by Lucas Fonseca (QuintoAndar) , Rafael Ribaldo (QuintoAndar)

ELK

At QuintoAndar we seek automation and scalability in our data pipelines and believe that Airflow is the right tool for giving us exactly what we need. However, having all concerns mapped and tooling defined doesn’t necessarily mean success. For months we had struggled with a misconception that Airflow should act as an orchestrator and executor within a monolithic strategy. That could not be further from the truth because of the rise of scalability and performance issues, infrastructure and maintainability costs, and multi-directional impact throughout development teams. Employing Airflow, though, as an orchestration-only solution may help teams deliver value to end users in a more efficient, reliable and performant manner, where data pipelines can be executed anywhere with proper resources and optimizations. Those are the reasons we have shifted from an orchestrate-execute strategy to an orchestrate-only one, in order to leverage the full power of data pipeline management in Airflow. Straightaway the separation of data processing and pipeline coordination brought not only a finer resource tuning and better maintainability, but also a tremendous scalability on both ends.

Building a robust data pipeline with the dAG stack: dbt, Airflow, Great Expectations

2021-07-01 · Airflow Summit 2021

session

by Sam Bail (Superconductive)

Data Engineering Data Quality Data Science dbt

Data quality has become a much discussed topic in the fields of data engineering and data science, and it has become clear that data validation is absolutely crucial to ensuring the reliability of any data products and insights produced by an organization’s data pipelines. This session will outline patterns for combining three popular open source tools in the data ecosystem - dbt, Airflow, and Great Expectations - and use them to build a robust data pipeline with data validation at each critical step.

Building a Scalable & Isolated Architecture for Preprocessing Medical Records

2021-07-01 · Airflow Summit 2021

session

by Mikaela Pisani , Anthony Figueroa

Kubernetes NLP Spark

After performing several experiments with Airflow, we reached the best architectural design for processing text medical records in scale. Our hybrid solution uses Kubernetes, Apache Airflow, Apache Livy, and Apache cTAKES. Using Kubernetes’ containers has the benefit of having a consistent, portable, and isolated environment for each component of the pipeline. With Apache Livy, you can run tasks in a Spark Cluster at scale. Additionally, Apache cTAKES helps with the extraction of information from electronic medical records clinical free-text by using natural language processing techniques to identify codable entities, temporal events, properties, and relations.

Building Providers & DAGs in the Airflow Ecosystem

2021-07-01 · Airflow Summit 2021

session

by Plinio Guzman

Learn how to use Airflow’s robust ecosystem of providers to construct secure, high-quality DAGs.

Building the AirflowEventStream

2021-07-01 · Airflow Summit 2021

session

by Jelle Munk (Adyen)

AI/ML Big Data Java

Or how to keep our traditional java application up-to-date on everything big data. At Adyen we process tens of millions of transactions a day, a number that rises every day. This means that generating reports, training machine learning models or any other operation that requires a bird’s eye view on weeks or months of data requires the use of Big Data technologies. We recently migrated to Airflow for scheduling all batch operations on our on-premise Big Data cluster. Some of these operations require input from our merchants or our support team. Merchants can for instance subscribe to reports, choose their preferred time zone, and even specify which columns they want included. After generating the reports, these reports then need to become available in our customer portal. So how do we keep track in our Customer Area which reports have been generated in Airflow? How do we launch ad-hoc backfills when one of our merchants subscribes to a new report? How do we integrate all of this into our existing monitoring pipeline? This talk will focus on how we have successfully integrated our big data platform with our existing Java web applications and how Airflow (with some simple add-ons) played a crucial role in achieving this.

Building the Data Science Platform with Airflow @Near

2021-07-01 · Airflow Summit 2021

session

by Manmeet Kaur

Data Science

At Near we work on TBs of Location data with close to real time modelling to generate key consumer insights and estimates for our clients across the globe. We have hundreds of country specific models deployed and managed through airflow to achieve this goal. Some of the workflows that we have deployed our schedule based, some are dynamic and some are trigger based. In this session I would be discussing some of the workflows that are being scheduled and monitored using airflow and the key benefits and also the challenges that we have faced in our production systems.

Clearing Airflow obstructions

2021-07-01 · Airflow Summit 2021

session

by Tatiana Al-Chueyr Martins (Astronomer)

AI/ML

Apache Airflow aims to speed the development of workflows, but developers are always ready to add bugs here and there. This talk illustrates a few pitfalls faced while developing workflows at the BBC to build machine learning models. The objective is to share some lessons learned and, hopefully, save others time. Some of the topics covered, with code examples: Tasks unsuitable to be run from within Airflow executors Plugins misusage Inconsistency while using an operator (Mis)configuration What to avoid during a workflow deployment Consequences of non-idempotent tasks

Contributing to Apache Airflow: First Steps

2021-07-01 · Airflow Summit 2021

session

by Ryan Hatter

Learn to contribute to the Apache Airflow ecosystem both with and without code. Post an article to the Airflow blog, improve documentation, or dive head-first into into Airflow’s free and open source software community.

Page 1 of 3

1 2 3