talk-data.com talk-data.com

H

Speaker

Howie Wang

4

talks

Member of Technical Staff at OpenAI

Filter by Event / Source

Talks & appearances

4 activities · Newest first

Search activities →

As organizations grow, the task of creating and managing Airflow DAGs efficiently becomes a challenge. In this talk, we will delve into innovative approaches to streamlining Airflow DAG creation using YAML. By leveraging YAML configuration, we allow users to dynamically generate Airflow DAGs without requiring Python expertise or deep knowledge of Airflow primitives. We will showcase the significant benefits of this approach, including eliminating duplicate configurations, simplifying DAG management for a large group of workflows, and ultimately enhancing productivity within large organizations. Join us to learn practical strategies to optimize workflow orchestration, reduce development overhead, and facilitate seamless collaboration across teams.

In Apple, we are building a self-serve data platform based on Airflow. Self-serve means users can create, deploy and run their DAGs freely. With provided logs and metrics, users are able to test or troubleshot DAGs on their own. Today, a common use case is, users want to test one or a few tasks in their DAG. However, when they trigger the DAG, all tasks instead of just the ones people are interested will run. To save time and resources, lots of users choose to manually mark complete for each tasks to skip. Can we do better than that? Is there an easy-peasy way to skip tasks? In this lightning talk, we would like to share the challenges we had, the solution we came up with, and the lesson we learned.