At Yahoo, we built a secure, scalable, and cost-efficient batch processing platform using Amazon MWAA to orchestrate Apache Flink jobs on EKS, managed by the Flink Kubernetes Operator. This setup enables dynamic job orchestration while meeting strict enterprise compliance standards. In this session, we’ll share how Airflow DAGs: Dynamically launch, monitor, and clean up isolated Flink clusters per batch job, improving resource efficiency. Securely fetch EKS kubeconfig, submit FlinkDeployment CRDs using FlinkKubernetesOperator, and poll job status using Airflow sensors. Integrate IAM for access control and meet Yahoo’s security requirements, including mutual TLS (mTLS) with Athenz. Optimize for cost and resilience through automated cleanup of jobs and the operator, and handle job failures and retries. Join us for practical strategies and lessons from Yahoo’s production-scale Flink workflows in a Kubernetes environment.
talk-data.com
Topic
Flink
Apache Flink
stream_processing
batch_processing
big_data
1
tagged
Activity Trend
7
peak/qtr
2020-Q1
2026-Q1
Top Events
Data Engineering Podcast
21
O'Reilly Data Engineering Books
15
Databricks DATA + AI Summit 2023
8
DATA MINER Big Data Europe Conference 2020
6
Data + AI Summit 2025
5
AWS re:Invent 2024
4
PyData Amsterdam 2025
2
Data Council 2023
2
Airflow Summit 2022
2
Airflow Summit 2025
2
PyData London 2025
1
Microsoft Ignite 2023
1
Filtering by:
Prakash Nandha Mukunthan
×