talk-data.com talk-data.com

Event

Airflow Summit 2021

2021-07-01 Airflow Summit Visit website ↗

Activities tracked

3

Airflow Summit 2021 program

Filtering by: Big Data ×

Sessions & talks

Showing 1–3 of 3 · Newest first

Search within this event →

Building the AirflowEventStream

2021-07-01
session
Jelle Munk (Adyen)

Or how to keep our traditional java application up-to-date on everything big data. At Adyen we process tens of millions of transactions a day, a number that rises every day. This means that generating reports, training machine learning models or any other operation that requires a bird’s eye view on weeks or months of data requires the use of Big Data technologies. We recently migrated to Airflow for scheduling all batch operations on our on-premise Big Data cluster. Some of these operations require input from our merchants or our support team. Merchants can for instance subscribe to reports, choose their preferred time zone, and even specify which columns they want included. After generating the reports, these reports then need to become available in our customer portal. So how do we keep track in our Customer Area which reports have been generated in Airflow? How do we launch ad-hoc backfills when one of our merchants subscribes to a new report? How do we integrate all of this into our existing monitoring pipeline? This talk will focus on how we have successfully integrated our big data platform with our existing Java web applications and how Airflow (with some simple add-ons) played a crucial role in achieving this.

Running Big Data Applications in production with Airflow + Firebolt

2021-07-01
session

In this talk we’ll see some real world examples from Firebolt customers demonstrating how Airflow is used to orchestrate operational data analytics applications with large data volumes, while keeping query latency low.

Usability Improvements: Debugging & Inspection Tooling

2021-07-01
session
Dinghang Yu (Pinterest) , Yulei Li (Pinterest) , Euccas Chen (Pinterest) , Ashim Shrestha (Pinterest) , Ace Haidrey (Pinterest)

The two most common user questions at Pinterest are: 1) why is my workflow running so long? 2) why did my workflow fail - is it my issue, or a platform issue? As with any big data organization, the workflow platform is just the orchestrator but the “real” work is done on another layer, managed by another platform. There can be plenty of these, and the challenges of figuring out the root cause of an issue can be mundane and time consuming. At Pinterest, we set out to provide additional tooling in our Airflow webserver to make it a quicker inspection process and provide smart tips such as increased runtime analysis, bottleneck identifying, rca, and an easy way for backfilling. We explore deeper the tooling provided to reduce the admin load, and empower our users.