We have a similar pattern of DAGs running for different data quality dimensions like accuracy, timeliness, & completeness. To do this again and again, we would be duplicating and potentially introducing human error while doing copy paste of code or making people write same code again. To solve for this, we are doing few things: Run DAGs via DagFactory to dynamically generate DAGs using just some YAML code for all the steps we want to run in our DQ checks. Hide this behind a UI which is hooked to github PR open step, now the user just provides some inputs or selects from dropdown in UI and a YAML DAG is generated for them. This highlights the potential for DAGFactory to hide Airflow Python code from users and make it more accessible to Data Analysts and Business Intelligence along with normal Software Engg, along with reducing human error. YAML is the perfect format to be able to generate code, create a PR and DagFactory is the perfect fir for that. All of this is running in GCP Cloud Composer.
talk-data.com
Topic
YAML
Yet Another Markup Language (YAML)
data_serialization
configuration_file_format
human_readable
file_format
1
tagged
Activity Trend
9
peak/qtr
2020-Q1
2026-Q1
Top Events
Airflow Summit 2025
3
dbt Coalesce 2023
3
O'Reilly Data Engineering Books
2
Airflow Summit 2024
2
Data Engineering Podcast
1
dbt Coalesce 2022
1
PyData Rhein-Main I Security Risks in AI & Structured Automation with Agentic AI
1
DataTopics: All Things Data, AI & Tech
1
AI Camp NYC: GenAI, LLMs and Agent
1
Big Data LDN 2025
1
Data Expo NL 2025
1
AI and Deep Learning for Enterprise #11
1
Filtering by:
Gangfeng Huang
×