Topic

YAML

Yet Another Markup Language (YAML)

data_serialization configuration_file_format human_readable file_format

Activities

1

tagged

Activity Trend

9 peak/qtr

2020-Q1 2026-Q1

Top Events

Airflow Summit 2025 3 dbt Coalesce 2023 3 O'Reilly Data Engineering Books 2 Airflow Summit 2024 2 Data Engineering Podcast 1 dbt Coalesce 2022 1 PyData Rhein-Main I Security Risks in AI & Structured Automation with Agentic AI 1 DataTopics: All Things Data, AI & Tech 1 AI Camp NYC: GenAI, LLMs and Agent 1 Big Data LDN 2025 1 Data Expo NL 2025 1 AI and Deep Learning for Enterprise #11 1

Top Speakers

Josh Kodroff (Pulumi) 3 Alexander C. S. Hendorf (opotoc GmbH) 2 Rob Smith (East of England Co-Op) 2 Gangfeng Huang 1 Nathaniel Burren (Cityblock Health) 1 Hugh Greenish (Marsh McLennan) 1 Cyril Sonnefraud (Matillion) 1 Salih Goktug Kose 1 Rafi Kurlansik 1 Stanisław Smyl 1 Ashir Alam 1 Madhav Khakhar (Booking.com) 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Gangfeng Huang ×

Dynamic DAGs and Data Quality using DAGFactory

2025-07-01 · Airflow Summit 2025

session

by Gangfeng Huang , Ashir Alam

Airflow BI Cloud Computing Data Quality GCP GitHub Cloud Composer Python

We have a similar pattern of DAGs running for different data quality dimensions like accuracy, timeliness, & completeness. To do this again and again, we would be duplicating and potentially introducing human error while doing copy paste of code or making people write same code again. To solve for this, we are doing few things: Run DAGs via DagFactory to dynamically generate DAGs using just some YAML code for all the steps we want to run in our DQ checks. Hide this behind a UI which is hooked to github PR open step, now the user just provides some inputs or selects from dropdown in UI and a YAML DAG is generated for them. This highlights the potential for DAGFactory to hide Airflow Python code from users and make it more accessible to Data Analysts and Business Intelligence along with normal Software Engg, along with reducing human error. YAML is the perfect format to be able to generate code, create a PR and DagFactory is the perfect fir for that. All of this is running in GCP Cloud Composer.