talk-data.com talk-data.com

Event

Airflow Summit 2020

2020-07-01 Airflow Summit Visit website ↗

Activities tracked

4

Airflow Summit 2020 program

Filtering by: Data Engineering ×

Sessions & talks

Showing 1–4 of 4 · Newest first

Search within this event →

Achieving Airflow Observability

2020-07-01
session

Identify issues in a fraction of the time and streamline root cause analysis for your DAGs. Airflow is the leading orchestration platform for data engineers. But when running Airflow at production scale, many teams have bigger needs for monitoring jobs, creating the right level of alerting, tracking problems in data, and finding the root cause of errors. In this talk we will cover our suggested approach to gaining Airflow observability so that you have the visibility you need to be productive. What is observability? The capability of monitoring and analyzing event logs, along with KPIs and other data, that yields actionable insights. In the data engineering context, observability is crucial for finding problems in jobs and data before those problems impact data consumers downstream. It’s a particularly difficult challenge because of the different platforms data engineers use (Airflow, Spark, Kubernetes, etc.) and the complicated life cycle of data pipeline CI/CD. In the session, we will do a deep dive into the visibility gaps your team might face running production-scale Airflow. We will walk through a typical day in the life of finding errors in DAGs, offer best practices, and discuss open source tools you can use to extend Airflow for observability and robust monitoring. We will use standard Airflow DAG examples to guide the presentation.

Achieving Airflow observability with Databand

2020-07-01
session
Josh Benamram (Databand.ai)

While Airflow is a central product for data engineering teams, it’s usually one piece of a bigger puzzle. The vast majority of teams use Airflow in combination with other tools like Spark, Snowflake, and BigQuery. Making sure pipelines are reliable, detecting issues that lead to SLA misses, and identifying data quality problems requires deep visibility into DAGs and data flows. Join this session to learn how Databand’s observability system makes it easy to monitor your end-to-end pipeline health and quickly remediate issues. This is a sponsored talk, presented by Databand .

Data engineering hierarchy of needs

2020-07-01
session
Angel Daz (Ocelot Data)

Data Infrastructures look differently between small, mid, and large sized companies. Yet, most content out there is for large and sophisticated systems. And almost none of it is on migrating a legacy, on-prem, databases over to the cloud. In order to better explain the evolving needs of data engineering organizations, we will review the hierarchy of needs for data engineering.

Pipelines on pipelines: Agile CI/CD workflows for Airflow DAGs

2020-07-01
session
Victor Shafran (Databand)

How do you create fast and painless delivery of new DAGs into production? When running Airflow at scale, it becomes a big challenge to manage the full lifecycle around your pipelines; making sure that DAGs are easy to develop, test, and ship into prod. In this talk, we will cover our suggested approach to building a proper CI/CD cycle that ensures the quality and fast delivery of production pipelines. CI/CD is the practice of delivering software from dev to prod, optimized for fast iteration and quality control. In the data engineering context, DAGs are just another piece of software that require some form of lifecycle management. Traditionally, DAGs have been thought of as relatively static, but the new wave of analytics and machine learning efforts require more agile DAG development, in line with how agile software engineering teams build and ship code. In this session, we will dive into the challenges of building CI/CD cycles for Airflow DAGs. We will focus on a pipeline that involves Apache Spark as an extra dimension of real-world complexity, walking through a typical flow of DAG authoring, debugging, and testing, from local to staging to prod environments. We will offer best practices and discuss open-source tools you can use to easily build your own smooth cycle for Airflow CI/CD.