We will describe how we were able to build a system in Airflow for MySQL to Redshift ETL pipelines defined in pure Python using dataclasses. These dataclasses are then used to dynamically generate DAGs depending on pipeline type. This setup allows us to implement robust testing, validation, alerts, and documentation for our pipelines. We will also describe the performance improvements we achieved by upgrading to Airflow 2.0.
talk-data.com
Topic
MySQL
relational_database
open_source
sql
2
tagged
Activity Trend
27
peak/qtr
2020-Q1
2026-Q1
Top Events
O'Reilly Data Engineering Books
153
Data Engineering Podcast
64
O'Reilly SQL Books
24
AWS re:Invent 2024
6
Google Cloud Next '25
3
Google Cloud Next '24
2
Microsoft Ignite 2023
2
O'Reilly Data Science Books
2
Data + AI Summit 2025
2
Airflow Summit 2021
2
Airflow Summit 2024
2
Databricks DATA + AI Summit 2023
1
Filtering by:
Airflow Summit 2021
×
As a follow up for https://airflowsummit.org/sessions/teaching-old-dag-new-tricks/ , in this talk, we would like to share a happy ending story on how Scribd fully migrated its data platform to the cloud and Airflow 2.0. We will talk about data validation tools and task trigger customizations the team built to smooth out the transition. We will share how we completed the Airflow 2.0 migration started with an unsupported MySQL version and metrics to prove why everyone should perform the upgrade. Lastly, we will discuss how large scale backfills (10 years worth of run) are managed and automated at Scribd.