The purpose of this session is to indicate how we leverage airflow in a federated way across all our business units to perform a cost-effective platform that accommodates different patterns of data integration, replication and ML tasks in a flexible way providing DevOps tunning of DAGs across environments that integrate to our open-source observability strategy that allows our SREs to have a consistent metrics, monitoring and alerting of data tasks. We will share the opinionated way we setup DAGs that include naming and folder structure conventions along coding expectation like the use of XCom specific entries to report processed elements and support for state for DAGs that require it as well as the expected configurable capabilities for tasks such as the type of runner for Apache Beam tasks. Along these ones we will also indicate the “DevOps DAGs” that we deploy in all our environments that take care of specific platform maintenance/support.
talk-data.com
Topic
Beam
Apache Beam
data_processing
batch_processing
stream_processing
1
tagged
Activity Trend
2
peak/qtr
2020-Q1
2026-Q1
Top Events
Data Engineering Podcast
3
O'Reilly Data Engineering Books
1
O'Reilly Data Science Books
1
Data Council Austin 2024 - Day 1
1
Airflow Summit 2023
1
Special Event: Beam Unconference organised by EEF, Alembic & bitcrowd
1
SciPy 2025
1
DATA MINER Big Data Europe Conference 2020
1
ADSP: Algorithms + Data Structures = Programs
1
Airflow Summit 2022
1
Data Science Retreat Demo Day #38
1
Making Data Simple
1
Filtering by:
Jose Puertos
×