talk-data.com

Topic

Airflow

Apache Airflow

workflow_management data_orchestration etl

Activities

tagged

Activity Trend

157 peak/qtr

2020-Q1 2026-Q1

Top Events

Airflow Summit 2025 139 Data Engineering Podcast 122 Airflow Summit 2024 92 Airflow Summit 2023 81 Airflow Summit 2022 52 Airflow Summit 2021 52 Airflow Summit 2020 39 O'Reilly Data Engineering Books 11 DATA MINER Big Data Europe Conference 2020 5 dbt Coalesce 2022 5 Airflow Monthly Virtual Town Hall- August 4 Airflow Monthly Virtual Town Hall- March 4

Top Speakers

Tobias Macey 122 Jarek Potiuk (Apache Software Foundation) 15 Kaxil Naik 12 Ash Berlin-Taylor (Astronomer) 11 Rafal Biegacz 10 Vikram Koka (Astronomer) 9 John Jackson 9 Brent Bovenzi (Astronomer) 7 Amogh Rajesh Desai 7 Maxime Beauchemin (Preset) 7 Tatiana Al-Chueyr Martins (Astronomer) 6 Jens Scheffler 6

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Airflow Summit 2024 ×

The road ahead: What’s coming in Airflow 3 and beyond?

2024-07-01 · Airflow Summit 2024

session

by Vikram Koka (Astronomer)

AI/ML

Apache Airflow has emerged as the defacto standard for data orchestration. Over the last couple of years, Airflow has also seen increasing adoption for ML and AI use cases. It has been almost four years since the release of Airflow 2 and as a community we have agreed that it’s time for a major foundational release in the form of Airflow 3. This talk will introduce the vision behind Airflow 3, including the emerging technology trends in the industry and how Airflow will evolve in response. Specifically, this will include an overview of the architectural changes in Airflow to support emerging use cases and distributed data infrastructure models. This talk will also introduce the major features and the desired outcomes of the release. Airflow 3 will be a foundational release and therefore this talk will similarly introduce the new concepts being introduced as part of Airflow 3, which may be fully realized in follow-on 3.x releases. The goal of this talk is to raise awareness about Airflow 3 and to get feedback from the Airflow community while the release is still in the development phase.

The Silent Symphony: Keeping Airflow's CI/CD and Dev Tools in Tune

2024-07-01 · Airflow Summit 2024

session

by Jarek Potiuk (Apache Software Foundation)

CI/CD

Apache Airflow relies on a silent symphony behind the scenes: its CI/CD (Continuous Integration/Continuous Delivery) and development tooling. This presentation explores the critical role these tools play in keeping Airflow efficient and innovative. We’ll delve into how robust CI/CD ensures bug fixes and improvements are seamlessly integrated, while well-maintained development tools empower developers to contribute effectively. Airflow’s power comes from a well-oiled machine – its CI/CD and development tools. This presentation dives into the world of these often-overlooked heroes. We’ll explore how seamless CI/CD pipelines catch and fix issues early, while robust development tools empower efficient coding and collaboration. Discover how you can use and contribute to a thriving Airflow ecosystem by ensuring these crucial tools stay in top shape.

Unleash the Power of AI: Streamlining Airflow DAG Development with AI-Driven Automation

2024-07-01 · Airflow Summit 2024

session

by Jeetendra Vaidya , Joseph Morotti , Sriharsh Adari

AI/ML GenAI

Nowadays, conversational AI is no longer exclusive to large enterprises. It has become more accessible and affordable, opening up new possibilities and business opportunities. In this session, discover how you can leverage Generative AI as your AI pair programmer to suggest DAG code and recommend entire functions in real-time, directly from your editor. Visualize how to harness the power of ML, trained on billions of lines of code, to transform natural language prompts into coding suggestions. Seamlessly cycle through lines of code, complete function suggestions, and choose to accept, reject, or edit them. Witness firsthand how Generative AI provides recommendations based on the project’s context and style conventions. The objective is to equip you with techniques that allow you to spend less time on boilerplate and repetitive code patterns, and more time on what truly matters: building exceptional orchestration software.

Unlocking FMOps/LLMOps with Airflow: A guide to operationalizing and managing Large Language Models

2024-07-01 · Airflow Summit 2024

session

by Parnab Basak (Amazon Web Services)

AI/ML Cloud Computing GenAI LLM MLOps

In the last few years Large Language Models (LLMs) have risen to prominence as outstanding tools capable of transforming businesses. However, bringing such solutions and models to the business-as-usual operations is not an easy task. In this session, we delve into the operationalization of generative AI applications using MLOps principles, leading to the introduction of foundation model operations (FMOps) or LLM operations using Apache Airflow. We further zoom into aspects of expected people and process mindsets, new techniques for model selection and evaluation, data privacy, and model deployment. Additionally, know how you can use the prescriptive features of Apache Airflow to aid your operational journey. Whether you are building using out of the box models (open-source or proprietary), creating new foundation models from scratch, or fine-tuning an existing model, with the structured approaches described you can effectively integrate LLMs into your operations, enhancing efficiency and productivity without causing disruptions in the cloud or on-premises.

Unlocking the Power of AI at Ford: A Behind-the-Scenes Look at Mach1ML and Airflow

2024-07-01 · Airflow Summit 2024

session

by Prince Bose (Mach1ML - Ford Motor Company) , Elona Zharri , Nikhil Nandoskar

AI/ML Cyber Security

Ford Motor Company is undergoing a significant transformation, embracing AI and Machine Learning to power its smart mobility strategy, enhance customer experiences, and drive innovation in the automotive industry. Mach1ML, Ford’s multi-million dollar ML platform, plays a crucial role in this journey by empowering data scientists and engineers to efficiently build, deploy, and manage ML models at scale. This presentation will delve into how Mach1ML leverages Apache Airflow as its orchestration layer to tackle the challenges of complex ML workflows that include disparate systems, manual processes, security concerns, and deployment complexities. We will explore the benefits of using Airflow, such as increased efficiency, improved reliability, enhanced scalability, and faster time-to-value. Additionally, we will showcase how Mach1ML utilizes Airflow capabilities to generate reusable templates and streamline environment promotions to further empower Ford’s AI practitioners and accelerate the delivery of cutting-edge AI-powered solutions supporting the next generation of vehicles.

Unlocking the Power of Airflow Beyond Data Engineering at Cloudflare

2024-07-01 · Airflow Summit 2024

session

by Jet Mariscal (Cloudflare)

AI/ML Cloudflare Data Engineering Data Science ETL/ELT

While Airflow is widely known for orchestrating and managing workflows, particularly in the context of data engineering, data science, ML (Machine Learning), and ETL (Extract, Transform, Load) processes, its flexibility and extensibility make it a highly versatile tool suitable for a variety of use cases beyond these domains. In fact, Cloudflare has publicly shared in the past an example on how Airflow was leveraged to build a system that automates datacenter expansions. In this talk, I will share a few more of our use cases beyond traditional data engineering, demonstrating Airflow’s sophisticated capabilities for orchestrating a wide variety of complex workflows, and discussing how Airflow played a crucial role in building some of the highly successful autonomous systems at Cloudflare, from handling automated bare metal server diagnostics and recovery at scale, to Zero Touch Provisioning that is helping us accelerate the roll out of inference-optimized GPUs in 150+ cities in multiple countries globally.

Using Airflow operational data to optimize Cloud services

2024-07-01 · Airflow Summit 2024

session

by Olivier Daneau

Astronomer BigQuery Cloud Computing Oracle Snowflake

Cost management is a continuous challenge for our data teams at Astronomer. Understanding the expenses associated with running our workflows is not always straightforward, and identifying which process ran a query causing unexpected usage on a given day can be time-consuming. In this talk, we will showcase an Airflow Plugin and specific DAGs developed and used internally at Astronomer to track and optimize the costs of running DAGs. Our internal tool monitors Snowflake query costs, provides insights, and sends alerts for abnormal usage. With it, Astronomer identified and refactored its most costly DAGs, resulting in an almost 25% reduction in Snowflake spending. We will demonstrate how to track Snowflake-related DAG costs and discuss how the tool can be adapted to any database supporting query tagging like BigQuery, Oracle, and more. This talk will cover the implementation details and show how Airflow users can effectively adopt this tool to monitor and manage their DAG costs.

Using the power of Apache Airflow and Ray for Scalable AI deployments

2024-07-01 · Airflow Summit 2024

session

by Marwan Sarieddine (Anyscale) , Venkata Jagannath

AI/ML LLM

Many organizations struggle to create a well-orchestrated AI infrastructure, using separate and disconnected platforms for data processing, model training, and inference, which slows down development and increases costs. There’s a clear need for a unified system that can handle all aspects of AI development and deployment, regardless of the size of data or models. Join our breakout session to see how our comprehensive solution simplifies the development and deployment of large language models in production. Learn how to streamline your AI operations by implementing an end-to-end ML lifecycle on your custom data, including - automated LLM fine-tuning, LLM evaluation & LLM serving and LoRA deployments

Weathering the Cloud Storms With Multi-Region Airflow Workflows

2024-07-01 · Airflow Summit 2024

session

by Amit Chauhan

Cloud Computing

Cloud availability zones and regions are not immune to outages. These zones regularly go down, and regions become unavailable due to natural disasters or human-caused incidents. Thus, if an availability zone or region goes down, so do your Airflow workflows and applications… unless your Airflow workflows function across multiple geographic locations. This hands-on session introduces you to the design patterns of multi-region Airflow workflows in the cloud, which can tolerate zone and region-level incidents. We will start with a traditional single-region configuration and then switch to a multi-region setting. By the end, we’ll have a working prototype of a multi-region Airflow pipeline that recovers from region-level outages within a few seconds, with no data loss or disruption to the application layer.

What If...? Running Airflow Tasks without the workers

2024-07-01 · Airflow Summit 2024

session

by Wei Lee

Airflow executes all tasks on the workers, including deferrable operators that must run on the workers before deferring to the triggerer. However, running some tasks directly from the triggerer can be beneficial in certain situations. This presentation will explain how deferrable operators function and examine ways to modify the Airflow implementation to enable tasks to run directly from the triggerer.

Why Do Airflow Tasks Fail? An Analysis through Machine Learning Techniques

2024-07-01 · Airflow Summit 2024

session

by David Xue (Astronomer) , Julian LaNeve (Astronomer)

AI/ML MLOps NLP

There are 3 certainties in life: death, taxes, and data pipelines failing. Pipelines may fail for a number of reasons: you may run out of memory, your credentials may expire, an upstream data source may not be reliable, etc. But there are patterns we can learn from! Join us as we walk through an analysis we’ve done on a massive dataset of Airflow failure logs. We’ll show how we used natural language processing and dimensionality reduction methods to explore the latent space of Airflow task failures in order to cluster, visualize, and understand failures. We’ll conclude the talk by walking through mitigation methods for common task failure reasons, and walk through how we can use Airflow to build an MLOps platform to turn this one-time analysis into a reliable, recurring activity.

Winning Strategies: Powering a World Series Victory with Airflow Orchestration

2024-07-01 · Airflow Summit 2024

session

by Oliver Dykstra (Texas Rangers)

Agile/Scrum

Dive into the winning playbook of the 2023 World Series Champions Texas Rangers, and discover how they leverage Apache Airflow to streamline their data pipelines. In this session, we’ll explore how real-world data pipelines enable agile decision-making and drive competitive advantage in the high-stakes world of professional baseball, all by using Airflow as an orchestration platform. Whether you’re a seasoned data engineer or just starting out, this session promises actionable strategies to elevate your data orchestration game to championship levels.

Page 5 of 5

← Previous

1 ... 3 4 5