talk-data.com talk-data.com

Event

Airflow Summit 2021

2021-07-01 Airflow Summit Visit website ↗

Activities tracked

4

Airflow Summit 2021 program

Filtering by: Data Science ×

Sessions & talks

Showing 1–4 of 4 · Newest first

Search within this event →

Building a robust data pipeline with the dAG stack: dbt, Airflow, Great Expectations

2021-07-01
session

Data quality has become a much discussed topic in the fields of data engineering and data science, and it has become clear that data validation is absolutely crucial to ensuring the reliability of any data products and insights produced by an organization’s data pipelines. This session will outline patterns for combining three popular open source tools in the data ecosystem - dbt, Airflow, and Great Expectations - and use them to build a robust data pipeline with data validation at each critical step.

Building the Data Science Platform with Airflow @Near

2021-07-01
session

At Near we work on TBs of Location data with close to real time modelling to generate key consumer insights and estimates for our clients across the globe. We have hundreds of country specific models deployed and managed through airflow to achieve this goal. Some of the workflows that we have deployed our schedule based, some are dynamic and some are trigger based. In this session I would be discussing some of the workflows that are being scheduled and monitored using airflow and the key benefits and also the challenges that we have faced in our production systems.

Introducing Viewflow: a framework for writing data models without writing Airflow code

2021-07-01
session

In this talk, we present Viewflow, an open-source Airflow-based framework that allows data scientists to create materialized views in SQL, R, and Python without writing Airflow code. We will start by explaining what problem does Viewflow solve: writing and maintaining complex Airflow code instead of focusing on data science. Then we will see how Viewflow solves that problem. We will continue by showing how to use VIewflow with several real-world examples. Finally, we will see what the upcoming features of Viewflow are! Resources: Announcement blog post: https://medium.com/datacamp-engineering/viewflow-fe07353fa068 GitHub repo: https://github.com/datacamp/viewflow

Productionizing ML Pipelines with Airflow, Kedro, and Great Expectations

2021-07-01
session

Machine Learning models can add value and insight to many projects, but they can be challenging to put into production due to problems like lack of reproducibility, difficulty maintaining integrations, and sneaky data quality issues. Kedro, a framework for creating reproducible, maintainable, and modular data science code, and Great Expectations, a framework for data validations, are two great open-source Python tools that can address some of these problems. Both integrate seamlessly with Airflow for flexible and powerful ML pipeline orchestration. In this talk we’ll discuss how you can leverage existing Airflow provider packages to integrate these tools to create sustainable, production-ready ML models.