talk-data.com talk-data.com

Event

Airflow Summit 2023

2023-07-01 Airflow Summit Visit website ↗

Activities tracked

5

Airflow Summit 2023 program

Filtering by: DWH ×

Sessions & talks

Showing 1–5 of 5 · Newest first

Search within this event →

Accelerating Data Delivery: How the FT automated its ETL pipelines with Airflow

2023-07-01
session

Inside the Financial Times, we’ve been gradually moving our batching data processing from a custom solution to Airflow. To enable various teams within the company to use Airflow more effectively, we’ve been working on extending the system’s self-service capabilities. This includes giving ownership to teams of their DAGs and separating resources such as connections. The batch data ingestion processes are the main ETL - like jobs that we run on Airflow. The creation of a new job used to be a manual and repetitive task of receiving the data specification, creating the requisite tables in our data warehouse and writing the DAG that would move the data there. Airflow allowed us to automate this process to a degree that surprised us, completely removing the need to write DAG code. We will use the talk to describe what the current process of creating a new ETL workflow looks like and our plans for further improvements.

Airflow at Asurion: Simplified orchestration at petabyte scale

2023-07-01
session

Workload Orchestration is at the heart of a successful Data lakehouse implementation. Especially for the “house” part which represents the Datawarehouse workloads which often are complex because of the very nature of warehouse data, which have dependency orchestration problems. We at Asurion have spent years in perfecting the Airflow solution to make it a super power for our Data Engineers. We have innovated in key areas like single operator for all use cases, auto DAG code generation, custom UI components for Data Engineers, monitoring tools etc. With over a few million job runs per year running on a platform with over 3 nines of availability, we have condensed years of our learnings into valuable ideas that can inspire and help all other Data enthusiasts. This session is going to walk the audience through some blind spots and pain points of Airflow architecture, scaling, engineering culture.

Airflow at Monzo: Evolving our data platform as the bank scales

2023-07-01
session
Jonathan Rainer , Ed Sparkes (Monzo - Making money work for everyone)

As a bank Monzo has seen exponential growth in active users, from 1.6 million in 2019 to 5.8 million in 2022. At the same time the number of data users and analysts has expanded from an initial team of 4 to 132. Alongside this growth, our infrastructure and tooling have had to evolve to deliver the same value at a new scale. From an Airflow installation deployed on a single monolithic instance we now deploy atop Kubernetes and have integrated our Airflow setup into the bank’s backend systems. This talk charts the story of that expansion and the growing pains we’ve faced, as well as looking to the future of our use of Airflow. We’ll first discuss how data at Monzo works, from event capture to arrival in our Data Warehouse, before assessing the challenges of our Airflow setup. We’ll then dive into the re-platforming that was required to meet our growing data needs, and some of the unique challenges that come with serving an ever growing user base and need for analysis and insight.

A Single Pane of Glass on Airflow using Astro Python SDK, Snowflake, dbt, and Cosmos

2023-07-01
session

ETL data pipelines are the bread and butter of data teams that must design, develop, and author DAGs to accommodate the various business requirements. dbt is becoming one of the most used tools to perform SQL transformations on the Data Warehouse, allowing teams to harness the power of queries at scale. Airflow users are constantly finding new ways to integrate dbt with the Airflow ecosystem and build a single pane of glass where Data Engineers can manage and administer their pipelines. Astronomer Cosmos, an open-source product, has been introduced to integrate Airflow with dbt Core seamlessly. Now you can easily see your dbt pipelines fully integrated on Airflow. You will learn the following: How to integrate dbt Core with Airflow How to use Cosmos How to build data pipelines at scale

Building an Open Source Data Warehouse

2023-07-01
session

Volunteers in Saint Louis are using Airflow to build an open source data warehouse of real estate data (permits, assessments, violations, etc), with an eye towards creating a national open data standard. This talk will focus on the unique challenges of running an open source data warehouse, and what it looks like to work with volunteers to create data pipelines.