talk-data.com talk-data.com

Event

Airflow Summit 2023

2023-07-01 Airflow Summit Visit website ↗

Activities tracked

5

Airflow Summit 2023 program

Filtering by: ETL/ELT ×

Sessions & talks

Showing 1–5 of 5 · Newest first

Search within this event →

Accelerating Data Delivery: How the FT automated its ETL pipelines with Airflow

2023-07-01
session

Inside the Financial Times, we’ve been gradually moving our batching data processing from a custom solution to Airflow. To enable various teams within the company to use Airflow more effectively, we’ve been working on extending the system’s self-service capabilities. This includes giving ownership to teams of their DAGs and separating resources such as connections. The batch data ingestion processes are the main ETL - like jobs that we run on Airflow. The creation of a new job used to be a manual and repetitive task of receiving the data specification, creating the requisite tables in our data warehouse and writing the DAG that would move the data there. Airflow allowed us to automate this process to a degree that surprised us, completely removing the need to write DAG code. We will use the talk to describe what the current process of creating a new ETL workflow looks like and our plans for further improvements.

Airflow at Bloomberg: Leveraging dynamic DAGs for data ingestion

2023-07-01
session

Bloomberg’s Data Platform Engineering team powers some of the most valuable business and financial data on which Bloomberg clients rely. We recently built a configuration-driven system that allows non-engineers to onboard alternative datasets into the company’s ecosystem. This system uses Apache Airflow to orchestrate the data flow across different applications and Bloomberg Terminal functions. We are unique in that we have over 1500 dynamic DAGs tailored for each dataset’s needs (which very few Airflow users have). In this talk, we will review our high-level Airflow architecture, how we leverage the dynamic DAGs in our ETL pipeline, as well as review some of the challenges we faced.

Airflow at UniCredit: Our journey from mainframe scheduling to modern data processing

2023-07-01
session

Representing the Murex Reporting team at UniCredit we would like to present our journey with Airflow, and how over the past two years it enabled us to automate and simplify our batch workflows. Comparing to our previous rigid mainframe scheduling approach, we have created a robust and scalable framework complete with a CI/CD process, bringing our time to market of scheduling changes down from 3 days to 1. Basing our solution on DAG networks joined by ResumeDagRunOperators and an array of custom-built plugins (such as static time predecessors) we were able to replicate the scheduling of our overnight ETL processes (consisting of approx. 8000 tasks with many-to-many dependencies) in Airflow, satisfying our bank reporting SLAs without performance regression and gaining massively improved process visibility and control. Our presentation will illustrate our journey and explore some of these customizations, which venture outside of Airflow’s core functionalities.

A Single Pane of Glass on Airflow using Astro Python SDK, Snowflake, dbt, and Cosmos

2023-07-01
session

ETL data pipelines are the bread and butter of data teams that must design, develop, and author DAGs to accommodate the various business requirements. dbt is becoming one of the most used tools to perform SQL transformations on the Data Warehouse, allowing teams to harness the power of queries at scale. Airflow users are constantly finding new ways to integrate dbt with the Airflow ecosystem and build a single pane of glass where Data Engineers can manage and administer their pipelines. Astronomer Cosmos, an open-source product, has been introduced to integrate Airflow with dbt Core seamlessly. Now you can easily see your dbt pipelines fully integrated on Airflow. You will learn the following: How to integrate dbt Core with Airflow How to use Cosmos How to build data pipelines at scale

Beyond Data Engineering: Airflow for Operations

2023-07-01
session

Much of the world sees Airflow as a hammer and ETL tasks as nails, but in reality, Airflow is much more of a sophisticated multitool, capable of orchestrating a wide variety of complex workflows. Astronomer’s Customer Reliability Engineering (CRE) team is leveraging this potential in its development of Airline, a tool powered by Airflow that monitors Airflow deployments and sends alerts proactively when issues arise. In this talk, Ryan Hatter from Astronomer will give an overview of Airline. He’ll explain how it integrates with ZenDesk, Kubernetes, and other services to resolve customers’ problems more quickly, and in many cases, even before customers realize there’s an issue. Join us for a practical exploration of Airflow’s capabilities beyond ETL, and learn how proactive, automated monitoring can enhance your operations.