Yes, you read that right — 200,000 pipelines, nearly 1 million task executions per day, all powered by a single Airflow instance. In this session, we’ll take you behind the scenes of one of the boldest orchestration projects ever attempted: how Uber’s data platform team is executing what might be the largest Apache Airflow migration in history — and doing it straight to Airflow 3. From scaling challenges and architectural choices to lessons learned in high-throughput orchestration, this is a deep dive into the tech, the chaos, and the strategy behind making data fly at unprecedented scale.
talk-data.com
Speaker
Sumit Maheshwari
3
talks
Filter by Event / Source
Talks & appearances
3 activities · Newest first
Up until a few years ago, teams at Uber used multiple data workflow systems, with some based on open source projects such as Apache Oozie, Apache Airflow, and Jenkins while others were custom built solutions written in Python and Clojure. Every user who needed to move data around had to learn about and choose from these systems, depending on the specific task they needed to accomplish. Each system required additional maintenance and operational burdens to keep it running, troubleshoot issues, fix bugs, and educate users. After this evaluation, and with the goal in mind of converging on a single workflow system capable of supporting Uber’s scale, we settled on an Airflow-based system. The Airflow-based DSL provided the best trade-off of flexibility, expressiveness, and ease of use while being accessible for our broad range of users, which includes data scientists, developers, machine learning experts, and operations employees. This talk will focus on scaling Airflow to Uber’s scale and providing a no-code seamless user experience
In the realm of data engineering, there is a prevalent tendency for professionals to develop similar Directed Acyclic Graphs (DAGs) to manage analogous tasks. Leveraging Dag Params presents an effective strategy for mitigating redundancy within these DAGs. Moreover, the utilization of Dag Params facilitates seamless enforcement of user inputs, thereby streamlining the process of incorporating validations into the DAG codebase.