talk-data.com talk-data.com

Event

Airflow Summit 2021

2021-07-01 Airflow Summit Visit website ↗

Activities tracked

6

Airflow Summit 2021 program

Filtering by: GitHub ×

Sessions & talks

Showing 1–6 of 6 · Newest first

Search within this event →

Advanced Superset for Engineers (API’s, Version Controlled Dashboards, & more)

2021-07-01
session

Apache Superset is a modern, open-source data exploration & visualization platform originally created by Maxime Beauchemin. In this talk, I will showcase advanced technical Superset features like the rich Superset API, how to version control dashboards using Github, embedding Superset charts in other applications, and more. This talk will be technical and hands-on, and I will share all code examples I use so you can play with them yourself afterwards!

Contributing to Apache Airflow | Journey to becoming Airflow's leading contributor

2021-07-01
session

From not knowing Python (let alone Airflow), and from submitting the first PR that fixes typo to becoming Airflow Committer, PMC Member, Release Manager, and #1 Committer this year, this talk walks through Kaxil’s journey in the Airflow World. The second part of this talk explains: how you can also start your OSS journey by contributing to Airflow Expanding familiarity with a different part of the Airflow codebase Continue committing regularly & steadily to become Airflow Committer. (including talking about current Guidelines of becoming a Committer) Different mediums of communication (Dev list, users list, Slack channel, Github Discussions etc)

Creating Data Pipelines with Elyra, a visual DAG composer and Apache Airflow

2021-07-01
session

This presentation will detail how Elyra creates Jupyter Notebook, Python and R script- based pipelines without having to leave your web browser. The goal of using Elyra is to help construct data pipelines by surfacing concepts and patterns common in pipeline construction into a familiar, easy to navigate interface for Data Scientists and Engineers so they can create pipelines on their own. In Elyra’s Pipeline Editor UI, portions of Apache Airflow’s domain language are surfaced to the user and either made transparent or understandable through the use of tooltips or helpful notes in the proper context during pipeline construction. With these features, Elyra can rapidly prototype data workflows without the need to know or write any pipeline code. Lastly, we will look at what features we have planned on our roadmap for Airflow, including more robust Kubernetes integration and support for runtime specific components/operators. Project Home: https://github.com/elyra-ai/elyra

Introducing Viewflow: a framework for writing data models without writing Airflow code

2021-07-01
session

In this talk, we present Viewflow, an open-source Airflow-based framework that allows data scientists to create materialized views in SQL, R, and Python without writing Airflow code. We will start by explaining what problem does Viewflow solve: writing and maintaining complex Airflow code instead of focusing on data science. Then we will see how Viewflow solves that problem. We will continue by showing how to use VIewflow with several real-world examples. Finally, we will see what the upcoming features of Viewflow are! Resources: Announcement blog post: https://medium.com/datacamp-engineering/viewflow-fe07353fa068 GitHub repo: https://github.com/datacamp/viewflow

Robots are your friends - using automation to keep your Airflow operators up to date

2021-07-01
session

As part of my role at Google, maintaining samples for Cloud Composer, hosted managed Airflow, is crucial. It’s not feasible for me to try out every sample every day to check that it’s working. So, how do I do it? Automation! While I won’t let the robots touch everything, they let me know when it’s time to pay attention. Here’s how: Step 0: An update for the operators is released Step 1: A GitHub bot called Renovate Bot opens up a PR to a special requirements file to make this update Step 2: Cloud build runs unit tests to make sure none of my DAGs immediately break Step 3: PR is approved and merged to main Step 4: Cloud Build updates my dev environment Step 5: I look at my DAGs in dev to make sure all is well. If there is a problem, I need to resolve it manually and revert my requirements file. Step 6: I manually update my prod PyPI packages I’ll discuss what automation tools I choose to use and why, and the places where I intentionally leave manual steps to ensure proper oversight.

Workshop: Contributing to Apache Airflow

2021-07-01
session

Participation in this workshop requires previous registration and has limited capacity. Get your ticket at https://ti.to/airflowsummit/2021-contributor By attending this workshop, you will learn how you can become a contributor to the Apache Airflow project. You will learn how to setup a development environment, how to pick your first issue, how to communicate effectively within the community and how to make your first PR - experienced committers of Apache Airflow project will give you step-by-step instructions and will guide you in the process. When you finish the workshop you will be equipped with everything that is needed to make further contributions to the Apache Airflow project. Prerequisites: You need to have Python experience . Previous experience in Airflow is nice-to-have. The session is geared towards Mac and Linux users. If you are a Windows user, it is best if you install Windows Subsystem for Linux (WSL). In preparation for the class, please make sure you have set up the following prerequisites: make a fork of the https://github.com/apache/airflow clone the forked repository locally follow the Breeze prerequisites: https://github.com/apache/airflow/blob/master/BREEZE.rst#prerequisites run ./breeze --python 3.6 create a virtualenv as described in https://github.com/apache/airflow/blob/master/LOCAL_VIRTUALENV.rst part of preparing the virtualenv is initializing it with ./breeze initialize-local-virtualenv