talk-data.com

Topic

Airflow

Apache Airflow

workflow_management data_orchestration etl

Activities

tagged

Activity Trend

157 peak/qtr

2020-Q1 2026-Q1

Top Events

Airflow Summit 2025 139 Data Engineering Podcast 122 Airflow Summit 2024 92 Airflow Summit 2023 81 Airflow Summit 2022 52 Airflow Summit 2021 52 Airflow Summit 2020 39 O'Reilly Data Engineering Books 11 DATA MINER Big Data Europe Conference 2020 5 dbt Coalesce 2022 5 Airflow Monthly Virtual Town Hall- August 4 Airflow Monthly Virtual Town Hall- March 4

Top Speakers

Tobias Macey 122 Jarek Potiuk (Apache Software Foundation) 15 Kaxil Naik 12 Ash Berlin-Taylor (Astronomer) 11 Rafal Biegacz 10 Vikram Koka (Astronomer) 9 John Jackson 9 Brent Bovenzi (Astronomer) 7 Amogh Rajesh Desai 7 Maxime Beauchemin (Preset) 7 Tatiana Al-Chueyr Martins (Astronomer) 6 Jens Scheffler 6

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Airflow Summit 2022 ×

TFX on Airflow with delegation of processing to third party services

2022-07-01 · Airflow Summit 2022

session

by Israel Herraiz , Paul Balm

AI/ML Flink Cloud Computing Dataflow GCP Spark TensorFlow

Get your ticket for this workshop Tensorflow Extended (TFX) can run machine learning pipelines on Airflow, but all the steps are run by default in the same workers where the Airflow DAG is running. This can lead to an excessive usage of resources, and breaks the assumption that Airflow is a scheduler; it becomes also the data processing platform. In this session, we will see how to use TFX with third party services, on top of Google Cloud Platform. The data processing steps can be run in Dataflow, Spark, Flink and other runners (parallelizing the processing of data and scaling up to petabytes), and the training steps can be run in Vertex or other external services. After this workshop, you will have learnt how to externalize any TFX heavyweight computing outside Airflow, while maintaining Airflow as the orchestrator for your machine learning pipelines.

The SLAyer your Data Pipeline Needs

2022-07-01 · Airflow Summit 2022

session

by Eden Gluska

Prometheus

Airflow has an inherent SLA alert mechanism. When the scheduler sees such an SLA miss for some task, it sends an alert by email. The problem is, that this email is nice, but we can’t really know when each task is eventually successful. Moreover, even if there is such an email upon success following an SLA miss, it does not give us a good view of the current status at any given time. In order to solve this, we developed SLAyer, an application that gets information of SLA misses from Airflow’s database and reports the current status to Prometheus, provides metrics per dag, task, and execution date currently in violation of its SLA.

The tale of a startup's data journey and its growing need for orchestration

2022-07-01 · Airflow Summit 2022

session

by Maxime Beauchemin (Preset)

Analytics BigQuery dbt Fivetran Modern Data Stack

This talk tells the story of how we have approached data and analytics as a startup at Preset and how the need for a data orchestrator grew over time. Our stack is (loosely) Fivetran/Segment/dbt/BigQuery/Hightouch, and we finally got to a place where we suffer quite a bit from not having an orchestrator and are bringing in Airflow to address our orchestration needs. This talk is about how startups approach solving data challenges, the shifting role of the orchestrator in the modern data stack, and the growing need for an orchestrator as your data platform becomes more complex.

Using Apache Airflow to orchestrate workflows across hybrid environments

2022-07-01 · Airflow Summit 2022

session

by Ricardo Sueiras (AWS)

Cloud Computing

According to analysts, 87 percent of enterprises have already adopted hybrid cloud strategies ( https://www.flexera.com/blog/industry-trends/trend-of-cloud-computing-2020/) . Customers have many reasons why they need to support hybrid environments, from maximising the value from heritage systems, to meeting local compliance and data processing regulations. As they build their data pipelines, they increasingly need to be able to orchestrate those across on-premesis and cloud environments. In this session, I will share how you can leverage Apache Airflow to orchestrate a workflow using data sources inside and outside the cloud.

Using the Fivetran Airflow Provider

2022-07-01 · Airflow Summit 2022

session

by Annie Kaufman , Spencer Weeks

Fivetran

Fivetran’s Airflow provider allows Recharge to manage our connector syncs alongside our other DAGs orchestrating related components of our core data pipelines. The provider has enabled increased flexibility on sync schedules, custom alerting, and quicker response times to failures.

Vega: Unifying Machine Learning Workflows at Credit Karma using Apache Airflow

2022-07-01 · Airflow Summit 2022

session

by Nicholas Pataki (Credit Karma) , Raj Katakam (Credit Karma) , Debasish Das (Credit Karma)

AI/ML API Beam BigQuery Cloud Computing ETL/ELT Python TensorFlow

At Credit Karma, we enable financial progress for more than 100 million of our members by recommending them personalized financial products when they interact with our application. In this talk we are introducing our machine learning platform to build interactive and production model-building workflows to serve relevant financial products to Credit Karma users. Vega, Credit Karma’s Machine Learning Platform, has 3 major components: 1) QueryProcessor for feature and training data generation, backed by Google BigQuery, 2) PipelineProcessor for feature transformations, offline scoring and model-analysis, backed by Apache Beam 3) ModelProcessor for running Tensorflow and Scikit models, backed by Google AI Platform, which provides data scientists the flexibility to explore different kinds of machine learning or deep learning models, ranging from gradient boosted trees to neural network with complex structures Vega exposed a unified Python API for Feature Generation, Modeling ETL, Model Training and Model Analysis. Vega supports writing interactive notebooks and python scripts to run these components in local mode with sampled data and in cloud mode for large scale distributed computing. Vega provides the ability to chain the processors provided by data scientists through Python code to define the entire workflow. Then it automatically generates the execution plan for deploying the workflow on Apache Airflow for running offline model experiments and refreshes. Overall, with the unified python API and automated Airflow DAG generation, Vega has improved the efficiency of ML Engineering. Using Airflow we deploy more than 20K features and 100 models daily

Well-Architected Workflows in Apache Airflow

2022-07-01 · Airflow Summit 2022

session

by Uma Ramadoss

Resilient systems have the capability to recover when stressed by load, bugs in the workflow, and failure of any task. Reliability of the infrastructure or platform is not sufficient to run workflows reliably. It is critical to bring in resiliency practices during the design and build phase of the workflow to improve reliability, performance and operational aspects of the workflow. In this session, We will go through Architecture of the Airflow through the lens of reliability Idempotency Designing for failures Applying back pressure Best practices What we do not cover: Infrastructure/Platform/Product reliability

What's new in Airflow 2.3?

2022-07-01 · Airflow Summit 2022

session

by Kaxil Naik

JSON

This session will talk about the awesome new features the community has built that would be part of Airflow 2.3. Highlights: Dynamic Task Mapping DB. Downgrades Pruning old DB records Connections using JSON UI Improvements

What's New with Amazon Managed Workflows for Apache Airflow (MWAA)

2022-07-01 · Airflow Summit 2022

session

by John Jackson

AWS

In this session we will discuss the latest features of Amazon Managed Workflows for Apache Airflow (MWAA) as well as some tips and tricks to get the most out of the service. We’ll also discuss the AWS commitment to the Apache Airflow project and what we’re doing to stay connected and contribute to the community.

Wisdoms learnt when contributing to Apache Airflow

2022-07-01 · Airflow Summit 2022

session

by Bowrna Prabhakaran

CI/CD Python

In this talk, I am going to share things that I learned while contributing to Apache Airflow. I am an Outreachy Intern for Apache Airflow. I made my first contribution to Open Source in the Apache Airflow project. I will also add a short description about myself and my experience working in Software Engineering and how i needed help in contributing to open source and ended up as an Intern for Outreachy. I also like to share about my first contribution towards Apache Airflow in its doc and how much confidence it gave me to continue contributing to it. Key things that I learned when contributing to Apache Airflow are: Clear communication in written form is very powerful. Code is not an asset and don’t worry about throwing it away. Don’t feel shy about asking questions. Open Source is a rich ecosystem where each projects help each other and thrive. Trivial things became no more trivial to me. While the above things are overall learning about open source contribution, I had specific important learnings for me which include writing unit tests, got to communicate with developers across the globe, improved written style of communication, knowing about many python libraries, understanding the CI pipeline.

Workshop: Contributing to Apache Airflow

2022-07-01 · Airflow Summit 2022

session

by Elad Kalif , Jarek Potiuk (Apache Software Foundation)

This workshop is sold out By attending this workshop, you will learn how you can become a contributor to the Apache Airflow project. You will learn how to setup a development environment, how to pick your first issue, how to communicate effectively within the community and how to make your first PR - experienced committers of Apache Airflow project will give you step-by-step instructions and will guide you in the process. When you finish the workshop you will be equipped with everything that is needed to make further contributions to the Apache Airflow project.

Workshop: Running Airflow within Cloud Composer

2022-07-01 · Airflow Summit 2022

session

by Bartosz Jankiewicz , Rafal Biegacz , Filip Knapik , Leah Cole , Przemek Więch

AI/ML BigQuery CI/CD Cloud Computing Dataflow GCP Cloud Composer

This workshop is sold out Hands on workshop showing how easy it is to deploy Airflow in a public Cloud. Workshop consists of 3 parts: Setting up Airflow environment and CI/CD for DAG deployment Authoring a DAG Troubleshoot Airflow DAG/Task execution failures This workshop will be based on Cloud Composer ( https://cloud.google.com/composer ) This workshop is mostly targeted at Airflow newbies and users who would like to learn more about Cloud Composer and how to develop DAGs using Google Cloud Platform services like BigQuery, Vertex AI, Dataflow.

Page 3 of 3

← Previous

1 2 3