talk-data.com talk-data.com

Event

Airflow Summit 2023

2023-07-01 Airflow Summit Visit website ↗

Activities tracked

7

Airflow Summit 2023 program

Filtering by: GitHub ×

Sessions & talks

Showing 1–7 of 7 · Newest first

Search within this event →

Airflow at GoDaddy: From on-prem to cloud to PaaS

2023-07-01
session

Discover the transformation of Airflow at GoDaddy: from its initial deployment on-prem to its migration to the cloud, and finally to a Single Pane Orchestration Model. This evolution has streamlined our Data Platform and improved governance. Our experience will be beneficial for anyone seeking to optimize their Airflow implementation and simplify their orchestration processes. History and Use-cases Design, Organization decisions, and Governance: Examining the decision-making process and governance structure. Migration to Cloud:Process of transitioning Airflow from on-premises to the cloud. Data Processing engines used with Airflow for Data Processing. Challenges: Obstacles faced during and after migration and how they were overcome. *Demonstrating how Airflow can be integrated with a central Glue Catalog and Data Lake Mesh model. Single Pane Orchestration (PAAS) and custom re-usable Github Actions: Examining benefits of using a Single Pane Orchestration model Monitoring

Better Support for Using Multiple Namespaces with KubernetesExecutor

2023-07-01
session

Airflow’s KubernetesExecutor has supported multi_namespace_mode for long time. This feature is great at allowing Airflow jobs to run in different namespaces on the same Kubernetes clusters for better isolation and easier management. However, this feature requires cluster-role for the Airflow scheduler, which can create security problems or be a blocker for some users. PR https://github.com/apache/airflow/pull/28047 , which will become available in Airflow 2.6.0, resolves this issue by allowing Airflow users to specify multi_namespace_mode_namespace_list when using multi_namespace_mode, so that no cluster-role is needed and user only needs to ensure the Scheduler has permissions to certain namespaces rather than all namespaces on the Kubernetes cluster. This talk aims to help you better understand KubernetesExecutor and how to set it up in a more secure manner.

Flexible DAG Trigger Forms (AIP-50)

2023-07-01
session

As user of Airflow we often use DagRun.conf attributes to control content and flow of a DAG run. Previously the Airflow UI only allowed to launch via JSON in the UI. This was technically feasible but not user friendly. A user needs to model, check and understand the JSON and enter parameters manually without the option to validate before trigger. Similar like Jenkins or Github/Azure pipelines we desire an UI option to trigger with a UI and specifying parameters. With Airflow 2.6.0 now the DAG.params are used to render a nice entry form and with a bit of options a user friendly trigger UI can be implemented. This session is showing how the new feature works and provides some examples how to use it for your purposes.

How to Build a System Test Dashboard and why you Should do it?

2023-07-01
session

System tests are executable DAGs for example and testing purposes. With a simple pytest command, you can run an entire DAG. From a provider point of view, they can be viewed as integration tests for all provider related operators and sensors. Running these system tests frequently and monitoring the results allow us to enforce stability amongst many other benefits. In this presentation we will explore how AWS built their system test environment, from the GitHub fork to the health dashboard that exists today…but more importantly, why you should do it as well!

Migrate Apache Oozie Workflows to Airflow and Run with Amazon EMR

2023-07-01
session

Learn how to convert Oozie Workflows into Airflow DAG and run it on Amazon EMR. The utility supports Airflow 2.4.3. This utility is built on top of https://github.com/GoogleCloudPlatform/oozie-to-airflow

My Journey to Committer Status: What I learned and how it can help you

2023-07-01
session
Niko Oliveira (Amazon | Apache Airflow Comitter)

Apache Airflow is one of the largest Apache projects by many metrics but it ranks particularly high in the number of contributors involved in the project. This leads to hundreds of Github Issues, Pull Requests and Discussions being submitted to the project every month. So it is critical to have an ample number of Committers to support the community. In this talk I will summarize my personal experience working towards, and ultimately achieving, committer status in Apache Airflow. I’ll cover the lessons I learned along the way as well as provide some advice and best practices to help others achieve committer status themselves.

Opportunities to join the Airflow (docs) community

2023-07-01
session

Open Source doc edits provide a low-stakes way for new users to first contribute. Ideally, new users find opportunities and feel welcome to fix docs as they learn, engaging with the community from the start. But, I found that contributing docs to Airflow had some surprising obstacles. In this talk, I’ll share my first docs contribution journey, including problems and fixes. For example, you must understand how Airflow uses Sphinx and know when to choose to edit in the GitHub UI or locally. But it wasn’t documented that GitHub renders only Markdown previews and since Sphinx uses markup, you must build docs locally to check formatting; an opportunity for me to add to the Contributor Guide for docs. In addition to examples of reducing obstacles, this talk covers the importance of docs for community and available resources to start writing. If you already contribute and want to create opportunities for others, I’ll also share characteristics of good first issues and docs projects.