As your organization scales to 20+ data science teams and 300+ DS/ML/DE engineers, you face a critical challenge: how to build a secure, reliable, and scalable orchestration layer that supports both fast experimentation and stable production workflows. We chose Airflow — and didn’t regret it! But to make it truly work at our scale, we had to rethink its architecture from the ground up. In this talk, we’ll share how we turned Airflow into a powerful MLOps platform through its core capability: running pipelines across multiple K8s GPU clusters from a single UI (!) using per-cluster worker pools. To support ease of use, we developed MLTool — our own library for fast and standardized DAG development, integrated Vault for secure secret management across teams, enabled real-time logging with S3 persistence and built a custom SparkSubmitOperator for Kerberos-authenticated Spark/Hadoop jobs in Kubernetes. We also streamlined the developer experience — users can generate a GitLab repo and deploy a versioned pipeline to prod in under 10 minutes! We’re proud of what we’ve built — and our users are too. Now we want to share it with the world!
talk-data.com
Topic
GitLab
version_control
ci_cd
devops
1
tagged
Activity Trend
3
peak/qtr
2020-Q1
2026-Q1
Top Events
Google Cloud Next '24
5
Google Cloud Next '25
3
Airflow Summit 2025
1
O'Reilly Data Science Books
1
Data + AI Summit 2025
1
PyData Paris 2025
1
Airflow Summit 2024
1
PyData Paris 2024
1
Blackout-Proof Coding: GitLab + Mastodon in Ruby
1
DATA MINER Big Data Europe Conference 2020
1
CocoaHeads de Novembre chez Photoroom
1
Data Council 2023
1
Filtering by:
Aleksandr Shirokov
×