talk-data.com talk-data.com

M

Speaker

Maciej Obuchowski

3

talks

Senior Software Engineer Datadog

Filter by Event / Source

Talks & appearances

3 activities · Newest first

Search activities →

“More data lineage” has been second most popular feature request in Airflow Survey 2023. However, despite the integration of OpenLineage in Airflow 2.7 through AIP-53, the most popular Operator in Airflow - PythonOperator - isn’t covered by lineage support. With addition of TaskFlow API, Airflow Datasets, Airflow ObjectStore, and many other small changes, writing DAGs without using other operators is easier than ever. And that’s why lineage collection in Airflow moves beyond covering specific Operators, to covering Hooks and Object Storage. In this session, you’ll learn how newly added AIP-62 will allow you author DAGs the way you love, while also keeping benefits of a data pipeline well covered by lineage.

With native support for OpenLineage in Airflow, users can now observe and manage their data pipelines with ease. This talk will cover the benefits of using OpenLineage, how it is implemented in Airflow, practical examples of how to take advantage of it, and what’s in our roadmap. Whether you’re an Airflow user or provider maintainer, this session will give you the knowledge to make the most of this tool.

OpenLineage is an open standard for metadata and lineage collection designed to instrument jobs as they are running. The standard has become remarkably adept at understanding the lifecycle of data within an organization. Additionally, Airflow lets you make use of OpenLineage with a convenient integration. Gathering data lineage has never been easier. In this talk, we’ll provide an update-to-date report on OpenLineage features and the Airflow integration – essential information for data governance architects & engineers.