talk-data.com talk-data.com

Event

Airflow Summit 2022

2022-07-01 Airflow Summit Visit website ↗

Activities tracked

7

Airflow Summit 2022 program

Filtering by: Cloud Computing ×

Sessions & talks

Showing 1–7 of 7 · Newest first

Search within this event →

Airflow in the Cloud: Lessons from the Field

2022-07-01
session

Airflow users love to run Airflow in public clouds and on distributed infrastructures like Kubernetes. Running Airflow environments is easier than ever - community offers Helm-based installation for self-managed Airflow and there are many offerings of Airflow-based managed services. Commoditization of Airflow and broader Airflow user base brings new challenges. This talk presents observations of the Airflow service provider delivering “Airflow as a Service’’ to cloud users (very technical, less technical and not technical at all). Information presented during this talk will be directed to the Apache Airflow committers and contributors with the hope that one can influence Airflow’s future roadmap so that Apache Airflow becomes easy to use.

Happy DAGs + Happy Teammates: How a little CI/CD can go a long way

2022-07-01
session

With a small amount of Cloud Build automation and the use of GitHub version control, your Airflow DAGs will always be tested and in sync no matter who is working on them. Leah will walk you through a sample CICD workflow for keeping your Airflow DAGs tested and in sync between environments and teammates.

Modern Data Orchestration managed by Astronomer

2022-07-01
session

At Astronomer we have been longtime supporters and contributors to open source Apache Airflow. In this session we will present Astronomer’s latest journey, Astro, our cloud-native managed service that simplifies data orchestration and reduces operational overhead. We will also discuss the increasing importance of data orchestration in modern enterprise data platforms, industry trends, and practical problems that arise in the ever expanding heterogeneous environments.

TFX on Airflow with delegation of processing to third party services

2022-07-01
session

Get your ticket for this workshop Tensorflow Extended (TFX) can run machine learning pipelines on Airflow, but all the steps are run by default in the same workers where the Airflow DAG is running. This can lead to an excessive usage of resources, and breaks the assumption that Airflow is a scheduler; it becomes also the data processing platform. In this session, we will see how to use TFX with third party services, on top of Google Cloud Platform. The data processing steps can be run in Dataflow, Spark, Flink and other runners (parallelizing the processing of data and scaling up to petabytes), and the training steps can be run in Vertex or other external services. After this workshop, you will have learnt how to externalize any TFX heavyweight computing outside Airflow, while maintaining Airflow as the orchestrator for your machine learning pipelines.

Using Apache Airflow to orchestrate workflows across hybrid environments

2022-07-01
session

According to analysts, 87 percent of enterprises have already adopted hybrid cloud strategies ( https://www.flexera.com/blog/industry-trends/trend-of-cloud-computing-2020/) . Customers have many reasons why they need to support hybrid environments, from maximising the value from heritage systems, to meeting local compliance and data processing regulations. As they build their data pipelines, they increasingly need to be able to orchestrate those across on-premesis and cloud environments. In this session, I will share how you can leverage Apache Airflow to orchestrate a workflow using data sources inside and outside the cloud.

Vega: Unifying Machine Learning Workflows at Credit Karma using Apache Airflow

2022-07-01
session
Nicholas Pataki (Credit Karma) , Debasish Das , Raj Katakam (Credit Karma)

At Credit Karma, we enable financial progress for more than 100 million of our members by recommending them personalized financial products when they interact with our application. In this talk we are introducing our machine learning platform to build interactive and production model-building workflows to serve relevant financial products to Credit Karma users. Vega, Credit Karma’s Machine Learning Platform, has 3 major components: 1) QueryProcessor for feature and training data generation, backed by Google BigQuery, 2) PipelineProcessor for feature transformations, offline scoring and model-analysis, backed by Apache Beam 3) ModelProcessor for running Tensorflow and Scikit models, backed by Google AI Platform, which provides data scientists the flexibility to explore different kinds of machine learning or deep learning models, ranging from gradient boosted trees to neural network with complex structures Vega exposed a unified Python API for Feature Generation, Modeling ETL, Model Training and Model Analysis. Vega supports writing interactive notebooks and python scripts to run these components in local mode with sampled data and in cloud mode for large scale distributed computing. Vega provides the ability to chain the processors provided by data scientists through Python code to define the entire workflow. Then it automatically generates the execution plan for deploying the workflow on Apache Airflow for running offline model experiments and refreshes. Overall, with the unified python API and automated Airflow DAG generation, Vega has improved the efficiency of ML Engineering. Using Airflow we deploy more than 20K features and 100 models daily

Workshop: Running Airflow within Cloud Composer

2022-07-01
session

This workshop is sold out Hands on workshop showing how easy it is to deploy Airflow in a public Cloud. Workshop consists of 3 parts: Setting up Airflow environment and CI/CD for DAG deployment Authoring a DAG Troubleshoot Airflow DAG/Task execution failures This workshop will be based on Cloud Composer ( https://cloud.google.com/composer ) This workshop is mostly targeted at Airflow newbies and users who would like to learn more about Cloud Composer and how to develop DAGs using Google Cloud Platform services like BigQuery, Vertex AI, Dataflow.