Data engineers have shifted from delivering data for internal analytics applications to customer-facing data products. And with that shift comes a whole new level of operational rigor necessary to instill trust and confidence in the data. How do you hold data pipelines to the same standards as traditional software applications? Can you apply principles learned from the field of SRE to the world of data? In this talk, we’ll explore how we’ve seen this evolve in Astronomer’s customer base and highlight best practices learned from the most critical data product applications we’ve seen. We’ll hear from Astronomer’s own data team as they went through the transformation from analytics to data products. And we’ll showcase a new product we’re building to help data teams around the world solve exactly this problem!
talk-data.com
Activities tracked
2
Airflow Summit 2024 program
Top Topics
Sessions & talks
Showing 1–2 of 2 · Newest first
There are 3 certainties in life: death, taxes, and data pipelines failing. Pipelines may fail for a number of reasons: you may run out of memory, your credentials may expire, an upstream data source may not be reliable, etc. But there are patterns we can learn from! Join us as we walk through an analysis we’ve done on a massive dataset of Airflow failure logs. We’ll show how we used natural language processing and dimensionality reduction methods to explore the latent space of Airflow task failures in order to cluster, visualize, and understand failures. We’ll conclude the talk by walking through mitigation methods for common task failure reasons, and walk through how we can use Airflow to build an MLOps platform to turn this one-time analysis into a reliable, recurring activity.