talk-data.com

Topic

Airflow

Apache Airflow

workflow_management data_orchestration etl

Activities

tagged

Activity Trend

157 peak/qtr

2020-Q1 2026-Q1

Top Events

Airflow Summit 2025 139 Data Engineering Podcast 122 Airflow Summit 2024 92 Airflow Summit 2023 81 Airflow Summit 2022 52 Airflow Summit 2021 52 Airflow Summit 2020 39 O'Reilly Data Engineering Books 11 DATA MINER Big Data Europe Conference 2020 5 dbt Coalesce 2022 5 Airflow Monthly Virtual Town Hall- August 4 Airflow Monthly Virtual Town Hall- March 4

Top Speakers

Tobias Macey 122 Jarek Potiuk (Apache Software Foundation) 15 Kaxil Naik 12 Ash Berlin-Taylor (Astronomer) 11 Rafal Biegacz 10 Vikram Koka (Astronomer) 9 John Jackson 9 Brent Bovenzi (Astronomer) 7 Amogh Rajesh Desai 7 Maxime Beauchemin (Preset) 7 Tatiana Al-Chueyr Martins (Astronomer) 6 Jens Scheffler 6

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Kaxil Naik ×

Airflow as an AI Agent's toolkit: Going beyond MCPs and unlocking Airflow's 1000+ Integrations

2025-07-01 · Airflow Summit 2025

session

by Kaxil Naik , Pavan kumar Gopidesu

AI/ML Cloud Computing Data Quality

What if your Airflow tasks could understand natural language AND adapt to schema changes automatically, while maintaining the deterministic, observable workflows we rely on? This talk introduces practical patterns for AI-native orchestration that preserve Airflow’s strengths while adding intelligence where it matters most. Through a real-world example, we’ll demonstrate AI-powered tasks that detect schema drift across multi-cloud systems and perform context-aware data quality checks that go beyond simple validation—understanding business rules, detecting anomalies, and generating validation queries from prompts like “check data quality across regions.” All within static DAG structures you can test and debug normally. We’ll show how AI becomes a first-class citizen by combining Airflow’s features, assets for schema context, Human-in-the-Loop for approvals, and AssetWatchers for automated triggers, with engines such as Apache DataFusion for high-performance query execution and support for cross-cloud data processing with unified access to multiple storage formats. These patterns apply directly to schema validation and similar cases where natural language can simplify complex operations. This isn’t about bolting AI onto Airflow. It’s about evolving how we build workflows, from brittle rules to intelligent adaptation, while keeping everything testable, auditable, and production-ready.

Introducing Apache Airflow® 3 – The Next Evolution in Orchestration

2025-07-01 · Airflow Summit 2025

session

by Jed Cunningham , Tzu-ping Chung , Kaxil Naik , Brent Bovenzi (Astronomer) , Buğra Öztürk , Ash Berlin-Taylor (Astronomer) , Vikram Koka (Astronomer) , Daniel Standish , Pierre Jeambrun , Vincent Beck , Constance Martineau , Amogh Rajesh Desai , Jens Scheffler

Apache Airflow® 3 is here, bringing major improvements to data orchestration. In this keynote, core Airflow contributors will walk through key enhancements that boost flexibility, efficiency, and user experience. Vikram Koka will kick things off with an overview of Airflow 3, followed by deep dives into DAG versioning (Jed Cunningham), enhanced backfilling (Daniel Standish), and a modernized UI (Brent Bovenzi & Pierre Jeambrun). Next, Ash Berlin-Taylor, Kaxil Naik, and Amogh Desai will introduce the Task Execution Interface and Task SDK, enabling tasks in any environment and language. Jens Scheffler will showcase the Edge Executor, while Constance Martineau, Tzu-ping Chung and Vincent Beck will demo event-driven scheduling and data assets. Finally, Buğra Öztürk will unveil CLI enhancements for automation and debugging. This keynote sets the stage for Airflow 3—don’t miss the chance to learn from the experts shaping the future of workflow orchestration!

Airflow 3 - Roadmap Discussion

2024-07-01 · Airflow Summit 2024

session

by Michał Modras (Google) , Kaxil Naik , Madison Swain-Bowden (Automattic) , Shubham Mehta (AWS Analytics) , Constance Martineau

Join us in this panel with key members of the community behind the development of Apache Airflow where we will discuss the tentative scope for the next generation, i.e. Airflow 3.

Gen AI using Airflow 3: A vision for Airflow RAGs

2024-07-01 · Airflow Summit 2024

session

by Kaxil Naik , Ash Berlin-Taylor (Astronomer)

AI/ML API GenAI LLM RAG Vector DB

Gen AI has taken the computing world by storm. As Enterprises and Startups have started to experiment with LLM applications, it has become clear that providing the right context to these LLM applications is critical. This process known as Retrieval augmented generation (RAG) relies on adding custom data to the large language model, so that the efficacy of the response can be improved. Processing custom data and integrating with Enterprise applications is a strength of Apache Airflow. This talk goes into details about a vision to enhance Apache Airflow to more intuitively support RAG, with additional capabilities and patterns. Specifically, these include the following Support for unstructured data sources such as Text, but also extending to Image, Audio, Video, and Custom sensor data LLM model invocation, including both external model services through APIs and local models using container invocation. Automatic Index Refreshing with a focus on unstructured data lifecycle management to avoid cumbersome and expensive index creation on Vector databases Templates for hallucination reduction via testing and scoping strategies

Building and deploying LLM applications with Apache Airflow

2023-07-01 · Airflow Summit 2023

session

by Kaxil Naik , Julian LaNeve (Astronomer)

AI/ML LLM

Behind the growing interest in Generate AI and LLM-based enterprise applications lies an expanded set of requirements for data integrations and ML orchestration. Enterprises want to use proprietary data to power LLM-based applications that create new business value, but they face challenges in moving beyond experimentation. The pipelines that power these models need to run reliably at scale, bringing together data from many sources and reacting continuously to changing conditions. This talk focuses on the design patterns for using Apache Airflow to support LLM applications created using private enterprise data. We’ll go through a real-world example of what this looks like, as well as a proposal to improve Airflow and to add additional Airflow Providers to make it easier to interact with LLMs such as the ones from OpenAI (such as GPT4) and the ones on HuggingFace, while working with both structured and unstructured data. In short, this shows how these Airflow patterns enable reliable, traceable, and scalable LLM applications within the enterprise.

Introducing airflowctl: A CLI to streamline getting started with Airflow

2023-07-01 · Airflow Summit 2023

session

by Kaxil Naik

Docker Python

New users starting with Airflow frequently encounter several challenges, ranging from the complexities of Containers and virtual environments to the Python dependency hell. Moreover, their familiarity with tools such as Docker, docker-compose, and Helm might be somewhat limited and even overkill. In contrast, seasoned Airflow users encounter their problems, encompassing configuration conflics with ongoing Airflow projects and intricacies stemming from Docker and docker-compose configurations and lack of visibility into all the projects. With airflowctl, users can install & setup Airflow using a single command. For existing users, they can use it to manage multiple Airflow projects with different Airflow versions on the same machine. This allows creating & debugging DAGs in an IDE seamlessly. Agenda for the call: Why airflowctl? Goal Current functionality & Demo Vision / Roadmap

What's new in Airflow 2.3?

2022-07-01 · Airflow Summit 2022

session

by Kaxil Naik

JSON

This session will talk about the awesome new features the community has built that would be part of Airflow 2.3. Highlights: Dynamic Task Mapping DB. Downgrades Pruning old DB records Connections using JSON UI Improvements

Airflow loves Kubernetes

2021-07-01 · Airflow Summit 2021

session

by Jarek Potiuk (Apache Software Foundation) , Kaxil Naik

Docker Kubernetes

In this talk Jarek and Kaxil will talk about official, community support for running Airflow in the Kubernetes environment. The full support for Kubernetes deployments was developed by the community for quite a while and in the past users of Airflow had to rely on 3rd-party images and helm-charts to run Airflow on Kubernetes. Over the last year community members made an enormous effort to provide robust, simple and versatile support for those deployments that would respond to all kinds of Airflow users. Starting from official container image, through quick-start docker-compose configuration, culminating in April with release of the official Helm Chart for Airflow. This talk is aimed for Airflow users who would like to make use of all the effort. The users will learn how to: Extend or customize Airflow Official Docker Image to adapt it to their needs Run quickstart docker-compose environment where they can quickly verify their images Configure and deploy Airflow on Kubernetes using the Official Airflow Helm chart

Contributing to Apache Airflow | Journey to becoming Airflow's leading contributor

2021-07-01 · Airflow Summit 2021

session

by Kaxil Naik

GitHub Python

From not knowing Python (let alone Airflow), and from submitting the first PR that fixes typo to becoming Airflow Committer, PMC Member, Release Manager, and #1 Committer this year, this talk walks through Kaxil’s journey in the Airflow World. The second part of this talk explains: how you can also start your OSS journey by contributing to Airflow Expanding familiarity with a different part of the Airflow codebase Continue committing regularly & steadily to become Airflow Committer. (including talking about current Guidelines of becoming a Committer) Different mediums of communication (Dev list, users list, Slack channel, Github Discussions etc)

Upgrading to Apache Airflow 2

2021-07-01 · Airflow Summit 2021

session

by Kaxil Naik

Airflow 2.0 was a big milestone for the Airflow community. However, companies and enterprises are still facing difficulties in upgrading to 2.0. In this talk, I would like to focus and highlight the ideal upgrade path and talk about upgrade_check CLI tool separation of providers registering connections types important 2.0 Airflow configs DB Migration deprecated feature around Airflow Plugins

Ask me anything with Airflow members

2020-07-01 · Airflow Summit 2020

session

by Jarek Potiuk (Apache Software Foundation) , Kaxil Naik

Ask me Anything with a group of Airflow committers & PMC members.

Keynote: Future of Airflow

2020-07-01 · Airflow Summit 2020

session

by Kamil Bregula , Jarek Potiuk (Apache Software Foundation) , Kaxil Naik , Daniel Imberman , Ash Berlin-Taylor (Astronomer) , Tomasz Urbaszek

A team of core committers explain what is coming to Airflow 2.0.