Deploying bad DAGs to your Airflow environment can wreak havoc. This talk provides an opinionated take on a mono repo structure for GCP data pipelines leveraging BigQuery, Dataflow and a series of CI tests for validating your Airflow DAGs before deploying them to Cloud Composer. Composer makes deploying airflow infrastructure easy and deploying DAGs “just dropping files in a GCS bucket”. However, this opens the opportunity for many organizations to shoot themselves in the foot by not following a strong CI/CD process. Pushing bad dags to Composer can manifest in a really sad airflow webserver and many wasted DAG parsing cycles in the scheduler, disrupting other teams using the same environment. This talk will outline a series of recommended continuous integration tests to validate PRs for updating or deploying new Airflow DAGs before pushing them to your GCP Environment with a small “DAGs deployer” application that will manage deploying DAGs following some best practices. This talk will walk through explaining automating these tests with Cloud Build, but could easily be ported to your favorite CI/CD tool.
talk-data.com
Topic
Dataflow
Google Cloud Dataflow
data_processing
stream_processing
google_cloud
1
tagged
Activity Trend
8
peak/qtr
2020-Q1
2026-Q1
Top Events
Data Engineering Podcast
19
Google Cloud Next '24
7
Google Cloud Next '25
5
O'Reilly Data Engineering Books
4
O'Reilly Data Science Books
3
Airflow Summit 2022
2
DATA MINER Big Data Europe Conference 2020
1
Data Council 2023
1
Experiencing Data w/ Brian T. O’Neill (AI & data product management leadership—powered by UX design)
1
Airflow Summit 2020
1
Making Data Simple
1
Straight Data Talk
1
Filtering by:
Airflow Summit 2020
×