Airflow Summit 2025

Airflow as a Platform for Agentic AI Digital Products Within Enterprises

2025-07-01

session

Vikram Koka (Astronomer) , Peeyush Rai

AI/ML Airflow AWS LLM

In this keynote, Peeyush Rai and Vikram Koka will be walking through how Airflow is being used as part of a Agentic AI platform servicing insurance companies, which runs on all the major public clouds, leveraging models from Open AI, Google (Gemini), AWS (Claude and Bedrock). This talk walks through the details of the actual end user business workflow including gathering relevant financial data to make a decision, as well as the tricky challenge of handling AI hallucinations, with new Airflow capabilities such as “Human in the loop”. This talk offers something for both business and technical audiences. Business users will get a clear view of what it takes to bring an AI application into production and how to align their operations and business teams with an AI enabled workflow. Meanwhile, technical users will walk away with practical insights on how to orchestrate complex business processes enabling a seamless collaboration between Airflow, AI Agents and Human in the loop.

Airflow DAG Upgrade Agent: Using Google ADK, Gemini, and Vertex AI RAG Engine to Accelerate Upgrades

2025-07-01

session

Christian Yarros

AI/ML Airflow LLM RAG

Join us to explore the DAG Upgrade Agent. Developed with Google Agent Development Kit and powered by Gemini, the DAG Upgrade Agent uses a rules-based framework to analyze DAG code, identify compatibility issues between core airflow and provider package versions, and generates precise upgrade recommendations and automated code conversions. Perfect for upcoming Airflow 3.0 migrations.

Airflow Uses in an on-prem Research Setting

2025-07-01

session

Lawrence Gerstley

Agile/Scrum AI/ML Airflow Kubernetes LLM NLP

KP Division of Research uses Airflow as a central technology for integrating diverse technologies in an agile setting. We wish to present a set of use-cases for AI/ML workloads, including imaging analysis (tissue segmentation, mammography), NLP (early identification of psychosis), LLM processing (identification of vessel diameter from radiological impressions), and other large data processing tasks. We create these “short-lived” project workflows to accomplish specific aims, and then may never run the job again, so leveraging generalized patterns are crucial to quickly implementing these jobs. Our Advanced Computational Infrastructure is comprised of multiple Kubernetes clusters, and we use Airflow to democratize the use of our batch level resources in those clusters. We use Airflow form-based parameters to deploy pods running R and Python scripts where generalized parameters are injected into scripts that follow internal programming patterns. Finally, we also leverage Airflow to create headless services inside Kubernetes for large computational workloads (Spark & H2O) that subsequent pods consume ephemerally.

Airflow Without Borders: A Journey into Internationalization (i18n)

2025-07-01

session

Shahar Epstein

Airflow LLM

One of the exciting new features in Airflow 3 is internationalization (i18n), bringing multilingual support to the UI and making Airflow more accessible to users worldwide. This talk will highlight the UI changes made to support different languages, including locale-aware adjustments. We’ll discuss how translations are contributed and managed — including the use of LLMs to accelerate the process — and why human review remains an essential part of it. We’ll present the i18n policy designed to ensure long-term maintainability, along with the tooling developed to support it. Finally, you’ll learn how to get involved and contribute to Airflow’s global reach by translating or reviewing content in your language.

Automating Healthcare Triage with Airflow and Large Language Models

2025-07-01

session

Milcah Mbithi

AI/ML Airflow Data Quality LLM

I will talk about how Apache Airflow is used in the healthcare sector with the integration of LLMs to enhance efficiency. Healthcare generates vast volumes of unstructured data daily, from clinical notes and patient intake forms to chatbot conversations and telehealth reports. Medical teams struggle to keep up, leading to delays in triage and missed critical symptoms. This session explores how Apache Airflow can be the backbone of an automated healthcare triage system powered by Large Language Models (LLMs). I’ll demonstrate how I designed and implemented an Airflow DAG orchestration pipeline that automates the ingestion, processing, and analysis of patient data from diverse sources in real-time. Airflow schedules and coordinates data extraction, preprocessing, LLM-based symptom extraction, and urgency classification, and finally routes actionable insights to healthcare professionals. The session will focus on the following; Managing complex workflows in healthcare data pipelines Safely integrating LLM inference calls into Airflow tasks Designing human-in-the-loop checkpoints for ethical AI usage Monitoring workflow health and data quality with Airflow.

Automating Threat Intelligence with Airflow, XDR, and LLMs using the MITRE ATT&CK Framework

2025-07-01

session

Karan Alang

AI/ML Airflow LLM Cyber Security

Security teams often face alert fatigue from massive volumes of raw log data. This session demonstrates how to combine Apache Airflow, Wazuh, and LLMs to build automated pipelines for smarter threat triage—grounded in the MITRE ATT&CK framework. We’ll explore how Airflow can orchestrate a full workflow: ingesting Wazuh alerts, using LLMs to summarize log events, matching behavior to ATT&CK tactics and techniques, and generating enriched incident summaries. With AI-powered interpretation layered on top of structured threat intelligence, teams can reduce manual effort while increasing context and clarity. You’ll learn how to build modular DAGs that automate: • Parsing and routing Wazuh alerts, • Querying LLMs for human-readable summaries, • Mapping IOCs to ATT&CK using vector similarity or prompt templates, • Outputting structured threat reports for analysts. The session includes a real-world example integrating open-source tools and public ATT&CK data, and will provide reusable components for rapid adoption. If you’re a SecOps engineer or ML practitioner in cybersecurity, this talk gives you a practical blueprint to deploy intelligent, scalable threat automation.

Designing Scalable Retrieval-Augmented Generation (RAG) Pipelines at SAP with Apache Airflow

2025-07-01

session

Sagar Sharma

AI/ML Airflow API GenAI Kubernetes LLM

At SAP Business AI, we’ve transformed Retrieval-Augmented Generation (RAG) pipelines into enterprise-grade powerhouses using Apache Airflow. Our Generative AI Foundations Team developed a cutting-edge system that effectively grounds Large Language Models (LLMs) with rich SAP enterprise data. Powering Joule for Consultants, our innovative AI copilot, this pipeline manages the seamless ingestion, sophisticated metadata enrichment, and efficient lifecycle management of over a million structured and unstructured documents. By leveraging Airflow’s Dynamic DAGs, TaskFlow API, XCom, and Kubernetes Event-Driven Autoscaling (KEDA), we achieved unprecedented scalability and flexibility. Join our session to discover actionable insights, innovative scaling strategies, and a forward-looking vision for Pipeline-as-a-Service, empowering seamless integration of customer-generated content into scalable AI workflows

Empowering Precision Healthcare with Apache Airflow-iKang Healthcare Group’s DataHub Journey

2025-07-01

session

Yuan Luo , Huiliang Zhang

Airflow Data Lakehouse LLM Data Streaming

iKang Healthcare Group, serving nearly 10 million patients annually, built a centralized healthcare data hub powered by Apache Airflow to support its large-scale, real-time clinical operations. The platform integrates batch and streaming data in a lakehouse architecture, orchestrating complex workflows from data ingestion (HL7/FHIR) to clinical decision support. Healthcare data’s inherent complexity—spanning structured lab results to unstructured clinical notes—requires dynamic, reliable orchestration. iKang uses Airflow’s DAGs, extensibility, and workflow-as-code capabilities to address challenges like multi-system coordination, semantic data linking, and fault-tolerant automation. iKang extended Airflow with cross-DAG event triggers, task priority weights, LLM-driven clinical text processing, and a visual drag-and-drop DAG builder for medical teams. These innovations improved diagnostic turnaround, patient safety, and cross-system workflow visibility. iKang’s work demonstrates Airflow’s power in transforming healthcare data infrastructure and advancing intelligent, scalable patient care.

From Oops to Secure Ops: Self-Hosted AI for Airflow Failure Diagnosis

2025-07-01

session

Nathan Hadfield

AI/ML Airflow API Cloud Computing LLM

Last year, ‘From Oops to Ops’ showed how AI-powered failure analysis could help diagnose why Airflow tasks fail. But do we really need large, expensive cloud-based AI models to answer simple diagnostic questions? Relying on external AI APIs introduces privacy risks, unpredictable costs, and latency, often without clear benefits for this use case. With the rise of distilled, open-source models, self-hosted failure analysis is now a practical alternative. This talk will explore how to deploy an AI service on infrastructure you control, compare cost, speed, and accuracy between OpenAI’s API and self-hosted models, and showcase a live demo of AI-powered task failure diagnosis using DeepSeek and Llama—running without external dependencies to keep data private and costs predictable.

How Pinterest Uses Ai to Empower Airflow Users for Troubleshooting

2025-07-01

session

Rachel Sun

AI/ML Airflow LLM

At Pinterest, there are over 10,000 DAGs supporting various use cases across different teams and roles. With this scale and diversity, user support has been an ongoing challenge to unlock productivity. As Airflow increasingly serves as a user interface to a variety of data and ML infrastructure behind the scenes, it’s common for issues from multiple areas to surface in Airflow, making triage and troubleshooting a challenge. In this session, we will discuss the scale of the problem we are facing, how we have addressed it so far, and how we are introducing LLM AI to help solve this problem.

LLM-Powered Review Analysis: Optimising Data Engineering using Airflow

2025-07-01

session

Naseem Shah

Airflow API CI/CD Data Engineering LLM postgresql

A real-world journey of how my small team at Xena Intelligence built robust data pipelines for our enterprise customers using Airflow. If you’re a data engineer, or part of a small team, this talk is for you. Learn how we orchestrated a complex workflow to process millions of public reviews. What You’ll Learn: Cost-Efficient DAG Designing: Decomposing complex processes into atomic tasks using the TaskFlow, XComs, Mapped tasks, and Task groups. Diving into one of our DAGs as a concrete example of how our approach optimizes parallelism, error handling, delivery speed, and reliability. Integrating LLM Analysis: Explore how we integrated LLM-based analysis into our pipeline. Learn how we designed the database, queries, and ingestion to Postgres. Extending Airflow UI: We developed a custom Airflow UI plugin that filters and visualizes DAG runs by customer, product, and marketplace, delivering clear insights for faster troubleshooting. Leveraging Airflow REST API: Discover how we leveraged the API to trigger DAGs on demand, elevating the UX by tracking mapped DAG progress and computing ETAs. CI/CD and Cost Management: Get practical tips for deploying DAGs with CI/CD.

Model Context Protocol with Airflow

2025-07-01

session

Abhishek Bhakat , Sudarshan Chaudhari

AI/ML Airflow LLM

In today’s data-driven world, effective workflow management and AI are crucial for success. However, there’s a notable gap between Airflow and AI. Our presentation offers a solution to close this gap. Proposing MCP (Model Context Protocol) server to act as a bridge. We’ll dive into two paths: AI-Augmented Airflow: Enhancing Airflow with AI to improve error handling, automate DAG generation, proactively detect issues, and optimize resource use. Airflow-Powered AI: Utilizing Airflow’s reliability to empower LLMs in executing complex tasks, orchestrating AI agents, and supporting decision-making with real-time data. Key takeaways: Understanding how to integrate AI insights directly into your workflow orchestration. Learning how MCP empowers AI with robust orchestration capabilities, offering full logging, monitoring, and auditability. Gaining insights into how to transform LLMS from a reactive responder to a proactive, intelligent, and reliable executor. Inviting you to explore how MCP can help workflow management, making AI-driven decisions more reliable and turning workflow systems into intelligent, autonomous agents.

Scaling Airflow at OpenAI

2025-07-01

session

Howie Wang , Ping Zhang

Airflow Kubernetes LLM

This talk shares how we scaled and hardened OpenAI’s Airflow deployment to orchestrate thousands of workflows on Kubernetes. We’ll cover key architecture choices, scaling strategies, and reliability improvements - along with practical lessons learned.

Vayu: The Airflow Copilot

2025-07-01

session

Sanchit Sreekanth , Muhammed Irshad

AI/ML Airflow Git LLM

Vayu is a conversational copilot for Apache Airflow, developed at Prevalent AI to help data engineers manage, troubleshoot, and fix pipelines using natural language. Deployments often fail silently due to misconfigurations, missing connections, or runtime issues impossible to identify in unit tests. Vayu tackles these via a troubleshooting agent that inspects logs, metrics, configs, and runtime state to find root causes and suggest fixes saving engineers significant troubleshooting time. It can also apply approved fixes to DAG code and commit them to your version control system. Key Capabilities: Troubleshooting Agent: Inspects logs, configs, variables, and connections to find root causes and suggest fixes. Pipeline Mechanic Agent: Suggests code-level fixes e.g., missing connections or bad imports and, once approved, commits them to version control. DAG Manager Agent: Understands DAG logic, suggests improvements, and can trigger DAGs conversationally. Architecture: Built with open-source tools including Google ADK as the orchestration layer and a custom Airflow MCP server based on the FastMCP framework. LLMs never access Airflow directly. The full codebase will be open-sourced.

talk-data.com

Top Topics

Top Speakers

Airflow as a Platform for Agentic AI Digital Products Within Enterprises

Airflow DAG Upgrade Agent: Using Google ADK, Gemini, and Vertex AI RAG Engine to Accelerate Upgrades

Airflow Uses in an on-prem Research Setting

Airflow Without Borders: A Journey into Internationalization (i18n)

Automating Healthcare Triage with Airflow and Large Language Models

Automating Threat Intelligence with Airflow, XDR, and LLMs using the MITRE ATT&CK Framework

Designing Scalable Retrieval-Augmented Generation (RAG) Pipelines at SAP with Apache Airflow

Empowering Precision Healthcare with Apache Airflow-iKang Healthcare Group’s DataHub Journey

From Oops to Secure Ops: Self-Hosted AI for Airflow Failure Diagnosis

How Pinterest Uses Ai to Empower Airflow Users for Troubleshooting

LLM-Powered Review Analysis: Optimising Data Engineering using Airflow

Model Context Protocol with Airflow

Scaling Airflow at OpenAI

Vayu: The Airflow Copilot