The Apache Airflow community is so large and active that it’s tempting to take the view that “if it ain’t broke don’t fix it.” In a community as in a codebase, however, improvement and attention are essential to sustaining growth. And bugs are just as inevitable in community management as they are in software development. If only the fixes were, too! Airflow is large and growing because users love Airflow and our community. But what steps could be taken to enhance the typical user’s and developer’s experience of the community? This talk will provide an overview of potential learnings for Airflow community management efforts, such as project governance and analytics, derived from the speaker’s experience managing the OpenLineage and Marquez open-source communities. The talk will answer questions such as: What can we learn from other open-source communities when it comes to supporting users and developers and learning from them? For example, what options exist for getting historical data out of Slack despite the limitations of the free tier? What tools can be used to make adoption metrics more reliable? What are some effective supplements to asynchronous governance?
talk-data.com
Speaker
Michael Robinson
3
talks
Filter by Event / Source
Talks & appearances
3 activities · Newest first
Airflow uses SQLAlchemy under the hood but up to this point has not exploited the tool’s capacity to produce detailed metadata about queries, tables, columns, and more. In fact, SQLAlchemy ships with an event listener that, in conjunction with OpenLineage, offers tantalizing possibilities for enhancing the development process – specifically in the areas of monitoring and debugging. SQLAlchemy’s event system features a Session object and ORMExecuteState mapped class that can be used to intercept statement executions and emit OpenLineage RunEvents as executions occur. In this talk, Michael Robinson from the community team at Astronomer will provide an overview and demo of new SQLAlchemyCollector and OpenLineageAdapter classes for leveraging SQLAlchemy’s event system to emit OpenLineage events as DAGs run.
This workshop is sold out Data lineage might seem like a complicated and unapproachable topic, but that’s only because data pipelines are complicated. The core concept is straightforward: trace and record the journey of datasets as they travel through a data pipeline. Marquez, a lineage metadata server, is a simple thing designed to watch complex things. It tracks the movement of data through complex pipelines using a straightforward, clear object model of Jobs, Datasets, and Runs. The information it gathers can be used to help you more effectively understand, communicate, and solve problems. The interactive UI allows you to see exactly where any inefficiencies have developed or datasets have become compromised. In this workshop, you will learn how to collect and visualize lineage from a basic Airflow pipeline using Marquez. You will need to understand the basics of Airflow, but no experience with lineage is required.