talk-data.com
Airflow Summit
session
2022-07-01
What is data lineage and why should I care?
Event:
Airflow Summit 2022
Speakers
Description
If a job fails, how can you learn about downstream datasets that have become out-of-date? Can you be confident that jobs are consuming fresh, high-quality data from their upstream sources? How might you predict the impact of a planned change on distant corners of the pipeline? These questions become easier once you have a complete understanding of data lineage, the complex set of relationships between all of your jobs and datasets. In this talk, Ross Turk from Datakin will provide a quick introduction to the core concepts behind data lineage and an overview of common architectural approaches.