talk-data.com talk-data.com

Event

Airflow Summit 2022

2022-07-01 Airflow Summit Visit website ↗

Activities tracked

4

Airflow Summit 2022 program

Filtering by: Python ×

Sessions & talks

Showing 1–4 of 4 · Newest first

Search within this event →

Beyond Testing: How to Build Circuit Breakers with Airflow

2022-07-01
session

Testing is an important part of the DataOps life cycle, giving teams confidence in the integrity of their data as it moves downstream to production systems. But what happens when testing doesn’t catch all of your bad data and “unknown unknown” data quality issues fall through the cracks? Fortunately, data engineers can apply a thing or two from DevOps best practices to tackle data quality at scale with circuit breakers, a novel approach to stopping bad data from actually entering your pipelines in the first place. In this talk, Prateek Chawla, Founding Team Member and Technical Lead at Monte Carlo, will discuss what circuit breakers are, how to integrate them with your Airflow DAGs, and what this looks like in practice. Time permitting, Prateek will also walk through how to build and automate Airflow circuit breakers across multiple cascading pipelines with Python and other common tools.

Introducing Astro Python SDK: The next generation of DAG authoring

2022-07-01
session

Imagine if you could chain together SQL models using nothing but python, write functions that treat Snowflake tables like dataframes and dataframes like SQL tables. Imagine if you could write a SQL airflow DAG using only python or without using any python at all. With Astro SDK, we at Astronomer have gone back to the drawing board around fundamental questions of what DAG writing could look like. Our goal is to empower Data Engineers, Data Scientists, and even the Business Analysts to write Airflow DAGs with code that reflects the data movement, instead of the system configuration. Astro will allow each group to focus on producing value in their respective fields with minimal knowledge of Airflow and high amounts of flexibility between SQL or python-based systems. This is way beyond just a new way of writing DAGs. This is a universal agnostic data transfer system. Users can run the exact same code against different databases (snowflake, bigquery, etc.) and datastores (GCS, S3, etc.) with no changes except to the connection IDs. Users will be able to promote a SQL flow from their dev postgres to their prod snowflake with a single variable change. We are ecstatic to reveal over eight months of work around building a new open-source project that will significantly improve your DAG authoring experience!

Vega: Unifying Machine Learning Workflows at Credit Karma using Apache Airflow

2022-07-01
session
Nicholas Pataki (Credit Karma) , Debasish Das , Raj Katakam (Credit Karma)

At Credit Karma, we enable financial progress for more than 100 million of our members by recommending them personalized financial products when they interact with our application. In this talk we are introducing our machine learning platform to build interactive and production model-building workflows to serve relevant financial products to Credit Karma users. Vega, Credit Karma’s Machine Learning Platform, has 3 major components: 1) QueryProcessor for feature and training data generation, backed by Google BigQuery, 2) PipelineProcessor for feature transformations, offline scoring and model-analysis, backed by Apache Beam 3) ModelProcessor for running Tensorflow and Scikit models, backed by Google AI Platform, which provides data scientists the flexibility to explore different kinds of machine learning or deep learning models, ranging from gradient boosted trees to neural network with complex structures Vega exposed a unified Python API for Feature Generation, Modeling ETL, Model Training and Model Analysis. Vega supports writing interactive notebooks and python scripts to run these components in local mode with sampled data and in cloud mode for large scale distributed computing. Vega provides the ability to chain the processors provided by data scientists through Python code to define the entire workflow. Then it automatically generates the execution plan for deploying the workflow on Apache Airflow for running offline model experiments and refreshes. Overall, with the unified python API and automated Airflow DAG generation, Vega has improved the efficiency of ML Engineering. Using Airflow we deploy more than 20K features and 100 models daily

Wisdoms learnt when contributing to Apache Airflow

2022-07-01
session

In this talk, I am going to share things that I learned while contributing to Apache Airflow. I am an Outreachy Intern for Apache Airflow. I made my first contribution to Open Source in the Apache Airflow project. I will also add a short description about myself and my experience working in Software Engineering and how i needed help in contributing to open source and ended up as an Intern for Outreachy. I also like to share about my first contribution towards Apache Airflow in its doc and how much confidence it gave me to continue contributing to it. Key things that I learned when contributing to Apache Airflow are: Clear communication in written form is very powerful. Code is not an asset and don’t worry about throwing it away. Don’t feel shy about asking questions. Open Source is a rich ecosystem where each projects help each other and thrive. Trivial things became no more trivial to me. While the above things are overall learning about open source contribution, I had specific important learnings for me which include writing unit tests, got to communicate with developers across the globe, improved written style of communication, knowing about many python libraries, understanding the CI pipeline.