talk-data.com talk-data.com

Event

Airflow Summit 2023

2023-07-01 Airflow Summit Visit website ↗

Activities tracked

6

Airflow Summit 2023 program

Filtering by: AI/ML ×

Sessions & talks

Showing 1–6 of 6 · Newest first

Search within this event →

AI/ML is Changing Orchestration: How Kubernetes can accelerate Airflow

2023-07-01
session

It should be no surprise to the Airflow community that the hype around generative large language models (LLMs) and their wildly-inventive chat front ends have brought significant attention to growing these models and feeding them on a steady diet of data. For many communities in the infrastructure, orchestration, and data landscape this is an opportunity to think big, help our users scale, and make the right foundational investments to sustain that growth over the long term. In this keynote I’ll talk about my own community, Kubernetes, and how we’re using the surge in AI/ML excitement to address long standing gaps and unlock new capabilities. Not just for the workloads using GPUs and the platform teams supporting them, but thinking about how we can accelerate Airflow users and other key automators of workflow. We’re all in this together, and the future of orchestration is moving mountains of data at the speed of light!

Airflow at Delivery Hero: Running a data mesh with ~500 Airflow instances

2023-07-01
session

Ever thought how airflow could play a pivotal role in data mesh architecture, hosting thousands of DAGs and hundreds of thousands daily running tasks, let’s find out! Delivery Hero delivers food in 70 countries with 12 different brands and platforms. With thousands of engineers, analysts and data scientists spread across many countries running analytics and ML services for all these orders delivered. Serving the workflow orchestration needs for such a massive group becomes a challenge. This is where airflow and data mesh comes to rescue, by running more than 500 airflow instances to empower different teams to own and curate data products. This presentation will explain how to efficiently setup and monitor airflow at massive scale. New feature of launching dynamic airflow staging and development environments dedicated for each developer. Demo about “Workspace” concept in direction of multi-tenancy management.

Airflow at Faire: Democratizing ML feature store framework at scale

2023-07-01
session
Victoria Varney (Astronomer) , Rafay Aleem (Faire)

Data science and machine learning are at the heart of Faire’s industry-celebrated marketplace (a16z top-ranked marketplace) and drive powerful search, navigation, and risk functions which are powered by ML models that are trained on 3000+ features defined by our data scientists. Previously, defining, backfilling and maintaining feature lifecycle was error-prone. Having a framework built on top of Airflow has empowered them to maintain and deploy their changes independently. We will explore: How to leverage Airflow as a tool that can power ML training and extend it with a framework that powers feature store. Enabling data scientists to define new features and backfill them (common problem in the ML world) using dynamic DAGs. The talk will provide valuable insights into how Faire constructed a framework that builds datasets to train models. Plus, how empowering end-users with tools isn’t something to fear but frees up engineering teams to focus on strategic initiatives.

Airflow at The Home Depot Canada: Observable orchestration platform for data integration and ML

2023-07-01
session

The purpose of this session is to indicate how we leverage airflow in a federated way across all our business units to perform a cost-effective platform that accommodates different patterns of data integration, replication and ML tasks in a flexible way providing DevOps tunning of DAGs across environments that integrate to our open-source observability strategy that allows our SREs to have a consistent metrics, monitoring and alerting of data tasks. We will share the opinionated way we setup DAGs that include naming and folder structure conventions along coding expectation like the use of XCom specific entries to report processed elements and support for state for DAGs that require it as well as the expected configurable capabilities for tasks such as the type of runner for Apache Beam tasks. Along these ones we will also indicate the “DevOps DAGs” that we deploy in all our environments that take care of specific platform maintenance/support.

Better Airflow with Metaflow : A modern human-centric ML infrastructure stack

2023-07-01
session

Airflow is a household brand in data engineering: It is readily familiar to most data engineers, quick to set up, and, as proven by millions of data pipelines powered by it since 2014, it can keep DAGs running. But with the increasing demands of ML, there is a pressing need for tools that meet data scientists where they are and address two pressing issues - improving the developer experience & minimizing operational overhead. In this talk, we discuss the problem space and the approach to solving it with Metaflow, the open-source framework we developed at Netflix, which now powers thousands of business-critical ML projects at Netflix & other companies. We wanted to provide data scientists with the best possible UX, allowing them to focus on parts they like (e.g., modeling) while providing robust solutions for the foundational infrastructure: data, compute, orchestration (using Airflow), & versioning. In this talk, we will demo our latest work that builds on top of Airflow.

Building and deploying LLM applications with Apache Airflow

2023-07-01
session

Behind the growing interest in Generate AI and LLM-based enterprise applications lies an expanded set of requirements for data integrations and ML orchestration. Enterprises want to use proprietary data to power LLM-based applications that create new business value, but they face challenges in moving beyond experimentation. The pipelines that power these models need to run reliably at scale, bringing together data from many sources and reacting continuously to changing conditions. This talk focuses on the design patterns for using Apache Airflow to support LLM applications created using private enterprise data. We’ll go through a real-world example of what this looks like, as well as a proposal to improve Airflow and to add additional Airflow Providers to make it easier to interact with LLMs such as the ones from OpenAI (such as GPT4) and the ones on HuggingFace, while working with both structured and unstructured data. In short, this shows how these Airflow patterns enable reliable, traceable, and scalable LLM applications within the enterprise.