Needing to trigger DAGs based on external criteria is a common use case for data engineers, data scientists, and data analysts. Most Airflow users are probably aware of the concept of sensors and how they can be used to run your DAGs off of a standard schedule, but sensors are only one of multiple methods available to implement event-based DAGs. In this session, we’ll discuss different ways of implementing event-based DAGs using Airflow 2 features like the API and deferrable operators, with a focus on how to determine which method is the most efficient, scalable, and cost-friendly for your use case.
talk-data.com
Activities tracked
4
Airflow Summit 2022 program
Top Topics
Sessions & talks
Showing 1–4 of 4 · Newest first
Lets use Airflow differently: let's talk load tests
Numeric results with bulletproof confidence: this is what companies actually sell when promoting their machine learning product. Yet this seems out of reach when the product is both generic and complex, with much of the inner calculations hidden from the end user. So how can code improvements or changes in core component performance be tested at scale? Implementing API and Load Tests is time-consuming, but thorough: defining parameters, building infrastructure and debugging. The bugs may be real, but they can also be a result of poor infrastructure implementation (who is testing the testers?). In this session we will discuss how Airflow can help scale up testing in a stable and sustainable way.
In this talk we want to present how Airbnb extends the REST api to support on-demand workload. A DAG object is created from a local environment like Jupyter notebook, serialized into binary and transported to the API. The API persists the DAG object into the meta DB and Airflow scheduler and worker are extended to process this new kind of DAG.
Vega: Unifying Machine Learning Workflows at Credit Karma using Apache Airflow
At Credit Karma, we enable financial progress for more than 100 million of our members by recommending them personalized financial products when they interact with our application. In this talk we are introducing our machine learning platform to build interactive and production model-building workflows to serve relevant financial products to Credit Karma users. Vega, Credit Karma’s Machine Learning Platform, has 3 major components: 1) QueryProcessor for feature and training data generation, backed by Google BigQuery, 2) PipelineProcessor for feature transformations, offline scoring and model-analysis, backed by Apache Beam 3) ModelProcessor for running Tensorflow and Scikit models, backed by Google AI Platform, which provides data scientists the flexibility to explore different kinds of machine learning or deep learning models, ranging from gradient boosted trees to neural network with complex structures Vega exposed a unified Python API for Feature Generation, Modeling ETL, Model Training and Model Analysis. Vega supports writing interactive notebooks and python scripts to run these components in local mode with sampled data and in cloud mode for large scale distributed computing. Vega provides the ability to chain the processors provided by data scientists through Python code to define the entire workflow. Then it automatically generates the execution plan for deploying the workflow on Apache Airflow for running offline model experiments and refreshes. Overall, with the unified python API and automated Airflow DAG generation, Vega has improved the efficiency of ML Engineering. Using Airflow we deploy more than 20K features and 100 models daily