The Bloomberg Data Platform Engineering team is responsible for managing, storing, and providing access to business and financial data used by financial professionals across the global capital markets. Our team utilizes Apache Airflow to orchestrate data workflows across various applications and Bloomberg Terminal functions. Over the years, we have fine-tuned our Airflow cluster to handle more than 1,000 ingestion DAGs, which has presented unique scalability challenges. In this session, we will share insights into several key Airflow parameters — some of which you may not be all that familiar with — that our team uses to optimize and scale the platform effectively.
talk-data.com
Speaker
Ivan Sayapin
2
talks
Filter by Event / Source
Talks & appearances
2 activities · Newest first
Bloomberg’s Data Platform Engineering team powers some of the most valuable business and financial data on which Bloomberg clients rely. We recently built a configuration-driven system that allows non-engineers to onboard alternative datasets into the company’s ecosystem. This system uses Apache Airflow to orchestrate the data flow across different applications and Bloomberg Terminal functions. We are unique in that we have over 1500 dynamic DAGs tailored for each dataset’s needs (which very few Airflow users have). In this talk, we will review our high-level Airflow architecture, how we leverage the dynamic DAGs in our ETL pipeline, as well as review some of the challenges we faced.