Testing is an important part of the DataOps life cycle, giving teams confidence in the integrity of their data as it moves downstream to production systems. But what happens when testing doesn’t catch all of your bad data and “unknown unknown” data quality issues fall through the cracks? Fortunately, data engineers can apply a thing or two from DevOps best practices to tackle data quality at scale with circuit breakers, a novel approach to stopping bad data from actually entering your pipelines in the first place. In this talk, Prateek Chawla, Founding Team Member and Technical Lead at Monte Carlo, will discuss what circuit breakers are, how to integrate them with your Airflow DAGs, and what this looks like in practice. Time permitting, Prateek will also walk through how to build and automate Airflow circuit breakers across multiple cascading pipelines with Python and other common tools.
talk-data.com
Topic
Monte Carlo
data_observability
data_reliability
data_quality
2
tagged
Activity Trend
12
peak/qtr
2020-Q1
2026-Q1
Top Events
Filtering by:
Airflow Summit 2022
×
“Why is my data missing?” “Why didn’t my Airflow job run?” “What happened to this report?” If you’ve been on the receiving end of any of these questions, you’re not alone. As data pipelines become increasingly complex and companies ingest more and more data, data engineers are on the hook for troubleshooting where, why, and how data quality issues occur, and most importantly, fixing them so systems can get up and running again. In this talk, Francisco Alberini, Monte Carlo’s first product hire, discusses the three primary factors that contribute to data quality issues and how data teams can leverage Airflow, dbt, and other solutions in their arsenal to conduct root cause analysis on their data pipelines.