Topic

Hadoop

Apache Hadoop

big_data distributed_computing data_processing

Activities

3

tagged

Activity Trend

3 peak/qtr

2020-Q1 2026-Q1

Top Events

O'Reilly Data Engineering Books 165 Data Engineering Podcast 29 O'Reilly Data Science Books 21 Databricks DATA + AI Summit 2023 6 Making Data Simple 5 Secrets of Data Analytics Leaders 5 Google Cloud Next '24 3 Airflow Summit 2025 3 DataTalks.Club 2 O'Reilly Business Intelligence Books 2 The Analytics Engineering Podcast 2 Data Council 2023 2

Top Speakers

Tobias Macey 29 Al Martin (IBM) 5 Wayne Eckerson (Eckerson Group) 5 Tom White 4 Deepak Vohra 4 Jean-Georges Perrin (Actian) 4 Julien Le Dem (Astronomer) 3 Wei Gong 3 Tanmay Deshpande 3 Donald Miner 3 Sandeep R Patil 3 Dino Quintero 3

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Airflow Summit 2025 ×

A Decade in Data Engineering - Lessons Realities and Where We Go From Here

2025-07-01 · Airflow Summit 2025

session

by Ben Rogojan (Facebook)

Airflow BigQuery Data Engineering Databricks Snowflake

There was a post on the data engineering subreddit recently that discussed how difficult it is to keep up with the data engineering world. Did you learn Hadoop, great we are on Snowflake, BigQuery and Databricks now. Just learned Airflow, well now we have Airflow 3.0. And the list goes on. But what doesn’t change, and what have been the lessons over the past decade. That’s what I’ll be covering in this talk. Real lessons and realities that come up time and time again whether you’re working for a start-up or a large enterprise.

Airflow & Bigtop: Modernize and integrate time-proven OSS stack with Apache Airflow

2025-07-01 · Airflow Summit 2025

session

by Kengo Seki , Masatake Iwasaki

Airflow Spark

Apache Bigtop is a time-proven open-source software stack for building data platform, which has been built around the Hadoop and Spark ecosystem since 2011. Its software composition has been changed during such a long period, and recently job scheduler is removed mainly due to the inactivity of its development. The speaker believes that Airflow perfectly fits into this gap and is proposing incorporating it in the Bigtop stack. This presentation will introduce how easily users can build a data platform with Bigtop including Airflow, and how Airflow can integrate those software with its wide range of providers and enterprise-readiness such as the Kerberos support.

Building an MLOps Platform for 300+ ML/DS Specialists on Top of Airflow

2025-07-01 · Airflow Summit 2025

session

by Aleksandr Shirokov , Roman Khomenko , Tarasov Alexey

AI/ML Airflow Data Science GitLab Kubernetes MLOps S3 Spark

As your organization scales to 20+ data science teams and 300+ DS/ML/DE engineers, you face a critical challenge: how to build a secure, reliable, and scalable orchestration layer that supports both fast experimentation and stable production workflows. We chose Airflow — and didn’t regret it! But to make it truly work at our scale, we had to rethink its architecture from the ground up. In this talk, we’ll share how we turned Airflow into a powerful MLOps platform through its core capability: running pipelines across multiple K8s GPU clusters from a single UI (!) using per-cluster worker pools. To support ease of use, we developed MLTool — our own library for fast and standardized DAG development, integrated Vault for secure secret management across teams, enabled real-time logging with S3 persistence and built a custom SparkSubmitOperator for Kerberos-authenticated Spark/Hadoop jobs in Kubernetes. We also streamlined the developer experience — users can generate a GitLab repo and deploy a versioned pipeline to prod in under 10 minutes! We’re proud of what we’ve built — and our users are too. Now we want to share it with the world!