talk-data.com talk-data.com

I

Speaker

Itai Yaffe

1

talks

Big Data Tech Lead Nielsen Identity Engine

Frequent Collaborators

Filtering by: Airflow Summit 2020 ×

Filter by Event / Source

Talks & appearances

Showing 1 of 3 activities

Search activities →

At Nielsen Identity Engine, we use Spark to process 10’s of TBs of data. Our ETLs, orchestrated by Airflow, spin-up AWS EMR clusters with thousands of nodes per day. In this talk, we’ll guide you through migrating Spark workloads to Kubernetes with minimal changes to Airflow DAGs, using the open-sourced GCP Spark-on-K8s operator and the native integration we recently contributed to the Airflow project.