talk-data.com talk-data.com

Topic

Dataflow

Google Cloud Dataflow

data_processing stream_processing google_cloud

1

tagged

Activity Trend

8 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Airflow Summit 2020 ×

Deploying bad DAGs to your Airflow environment can wreak havoc. This talk provides an opinionated take on a mono repo structure for GCP data pipelines leveraging BigQuery, Dataflow and a series of CI tests for validating your Airflow DAGs before deploying them to Cloud Composer. Composer makes deploying airflow infrastructure easy and deploying DAGs “just dropping files in a GCS bucket”. However, this opens the opportunity for many organizations to shoot themselves in the foot by not following a strong CI/CD process. Pushing bad dags to Composer can manifest in a really sad airflow webserver and many wasted DAG parsing cycles in the scheduler, disrupting other teams using the same environment. This talk will outline a series of recommended continuous integration tests to validate PRs for updating or deploying new Airflow DAGs before pushing them to your GCP Environment with a small “DAGs deployer” application that will manage deploying DAGs following some best practices. This talk will walk through explaining automating these tests with Cloud Build, but could easily be ported to your favorite CI/CD tool.