Take your DAGs in Apache Airflow to the next level? This is an insightful session where we’ll uncover 5 transformative strategies to enhance your data workflows. Whether you’re a data engineering pro or just getting started, this presentation is packed with practical tips and actionable insights that you can apply right away. We’ll dive into the magic of using powerful libraries like Pandas, share techniques to trim down data volumes for faster processing, and highlight the importance of modularizing your code for easier maintenance. Plus, you’ll discover efficient ways to monitor and debug your DAGs, and how to make the most of Airflow’s built-in features. By the end of this session, you’ll have a toolkit of strategies to boost the efficiency and performance of your DAGs, making your data processing tasks smoother and more effective. Don’t miss out on this opportunity to elevate your Airflow DAGs!
talk-data.com
Topic
Airflow
Apache Airflow
139
tagged
Activity Trend
Top Events
There was a post on the data engineering subreddit recently that discussed how difficult it is to keep up with the data engineering world. Did you learn Hadoop, great we are on Snowflake, BigQuery and Databricks now. Just learned Airflow, well now we have Airflow 3.0. And the list goes on. But what doesn’t change, and what have been the lessons over the past decade. That’s what I’ll be covering in this talk. Real lessons and realities that come up time and time again whether you’re working for a start-up or a large enterprise.
In today’s dynamic data environments, tables and schemas are constantly evolving and keeping semantic layers up to date has become a critical operational challenge. Manual updates don’t scale, and delays can quickly lead to broken dashboards, failed pipelines, and lost trust. We’ll show how to harness Apache Airflow 3 and its new event-driven scheduling capabilities to automate the entire lifecycle: detecting table and schema changes in real time, parsing and interpreting those changes, and shifting left the updating of semantic models across dbt, Looker, or custom metadata layers. AI agents will add intelligence and automation that rationalize schema diffs, assess impact of changes, and propose targeted updates to semantic layers reducing manual work and minimizing the risk of errors. We’ll dive into strategies for efficient change detection, safe incremental updates, and orchestrating workflows where humans collaborate with AI agents to validate and deploy changes. By the end of the session, you’ll understand how to build resilient, self-healing semantic layers that minimize downtime, reduce manual intervention, and scale effortlessly across fast-changing data environments.
Curious how code truly flows inside Airflow? Join me for a unique visualisation journey into Airflow’s inner workings (first of its kind) — code blocks and modules called when certain operations are running. A walkthrough that unveils task execution, observability, and debugging like never before. Scaling of Airflow in action, showing performance comparison b/w Airflow 3 vs 2. This session will demystify Airflow’s architecture, showcasing real-time task flows and the heartbeat of pipelines in action. Perfect for engineers looking to optimize workflows, troubleshoot efficiently, and gain a new perspective on Airflow’s powerful upgraded core. See Airflow running live with detailed insights and unlock the secrets to better pipeline management!
Are you looking to build slick, dynamic trigger forms for your DAGs? It all starts with mastering params. Params are the gold standard for adding execution options to your DAGs, allowing you to create dynamic, user-friendly trigger forms with descriptions, validation, and now, with Airflow 3, bidirectional support for conf data! In this talk, we’ll break down how to use params effectively, share best practices, and explore what’s new since the 2023 Airflow Summit talk ( https://airflowsummit.org/sessions/2023/flexible-dag-trigger-forms-aip-50/) . If you want to make DAG execution more flexible, intuitive, and powerful, this session is a must-attend!
In Airflow 2 there was a plugin mechanism to extend the UI for new functions as well as be able to add hooks and other features. As Airflow 3 rewrote the UI old Plugins were not working for all cases anymore. Airflow 3.1 now provides a re-vamped option to extend the UI with a new plugin schema in native React components and embedded iframes following AIP-68 definitions. In this session we will provide an overview about capabilities and give some intro how you can roll-your-own.
Airflow has been used by many companies as a core part of their internal data platform. Would you be interested in finding out how Airflow could play a pivotal role in achieving data engineering excellence and efficiency using modern data architecture. The best practices, tools and setup to achieve a stable but yet cost effective way of running small or big workloads, let’s find out! In this workshop we will review how an organisation can setup data platform architecture around Airflow and necessary requirements. Airflow and it’s role in Data Platform Different ways to organise airflow environment enabling scalability and stability Useful open source libraries and custom plugins allowing efficiency How to manage multi-tenancy, cost savings Challenges and factors to keep in mind using Success Matrix! This workshop should be suitable for any Architect, Data Engineer or Devops aiming to build/enhance their internal Data Platform. At the end of this workshop you would have solid understanding of initial setup and ways to optimise further getting most out of the tool for your own organisation.
What if your Airflow tasks could understand natural language AND adapt to schema changes automatically, while maintaining the deterministic, observable workflows we rely on? This talk introduces practical patterns for AI-native orchestration that preserve Airflow’s strengths while adding intelligence where it matters most. Through a real-world example, we’ll demonstrate AI-powered tasks that detect schema drift across multi-cloud systems and perform context-aware data quality checks that go beyond simple validation—understanding business rules, detecting anomalies, and generating validation queries from prompts like “check data quality across regions.” All within static DAG structures you can test and debug normally. We’ll show how AI becomes a first-class citizen by combining Airflow’s features, assets for schema context, Human-in-the-Loop for approvals, and AssetWatchers for automated triggers, with engines such as Apache DataFusion for high-performance query execution and support for cross-cloud data processing with unified access to multiple storage formats. These patterns apply directly to schema validation and similar cases where natural language can simplify complex operations. This isn’t about bolting AI onto Airflow. It’s about evolving how we build workflows, from brittle rules to intelligent adaptation, while keeping everything testable, auditable, and production-ready.
In this keynote, Peeyush Rai and Vikram Koka will be walking through how Airflow is being used as part of a Agentic AI platform servicing insurance companies, which runs on all the major public clouds, leveraging models from Open AI, Google (Gemini), AWS (Claude and Bedrock). This talk walks through the details of the actual end user business workflow including gathering relevant financial data to make a decision, as well as the tricky challenge of handling AI hallucinations, with new Airflow capabilities such as “Human in the loop”. This talk offers something for both business and technical audiences. Business users will get a clear view of what it takes to bring an AI application into production and how to align their operations and business teams with an AI enabled workflow. Meanwhile, technical users will walk away with practical insights on how to orchestrate complex business processes enabling a seamless collaboration between Airflow, AI Agents and Human in the loop.
The workflow orchestration team at Zoox aims to build a solution for orchestrating heterogeneous workflows encompassing data, ML, and QA pipelines. We have encountered two primary challenges: first, the steep learning curve for new Airflow users and the need for a user-friendly yet scalable development process; second, integrating and migrating existing pipelines with established solutions. This presentation will detail our approach, as a small team at Zoox, to address these challenges. The discussion will first present an exciting introduction to Zoox and what we do. Then we will walk down the memory lane of the past and current of Airflow use at Zoox. Furthermore, we will share our strategies for simplifying the Airflow DAG creation process and enhancing user experience. Lastly, we will share a few of our thoughts for how to grow the team and grow Airflow’s presence at Zoox in the future.
Apache Bigtop is a time-proven open-source software stack for building data platform, which has been built around the Hadoop and Spark ecosystem since 2011. Its software composition has been changed during such a long period, and recently job scheduler is removed mainly due to the inactivity of its development. The speaker believes that Airflow perfectly fits into this gap and is proposing incorporating it in the Bigtop stack. This presentation will introduce how easily users can build a data platform with Bigtop including Airflow, and how Airflow can integrate those software with its wide range of providers and enterprise-readiness such as the Kerberos support.
Join us to explore the DAG Upgrade Agent. Developed with Google Agent Development Kit and powered by Gemini, the DAG Upgrade Agent uses a rules-based framework to analyze DAG code, identify compatibility issues between core airflow and provider package versions, and generates precise upgrade recommendations and automated code conversions. Perfect for upcoming Airflow 3.0 migrations.
Airflow 3 introduced a game-changing feature: Dag versioning. Gone are the days of “latest only” Dags and confusing, inconsistent UI views when pipelines change mid-flight. This talk covers: Visualizing Dag changes over time in the UI How Dags code is versioned and can be grabbed from external sources Executing a whole Dag run against the same code version Dynamic Dags? Where do they fit in?! You’ll see real-world scenarios, UI demos, and learn how these advancements will help avoid “Airflow amnesia”.
KP Division of Research uses Airflow as a central technology for integrating diverse technologies in an agile setting. We wish to present a set of use-cases for AI/ML workloads, including imaging analysis (tissue segmentation, mammography), NLP (early identification of psychosis), LLM processing (identification of vessel diameter from radiological impressions), and other large data processing tasks. We create these “short-lived” project workflows to accomplish specific aims, and then may never run the job again, so leveraging generalized patterns are crucial to quickly implementing these jobs. Our Advanced Computational Infrastructure is comprised of multiple Kubernetes clusters, and we use Airflow to democratize the use of our batch level resources in those clusters. We use Airflow form-based parameters to deploy pods running R and Python scripts where generalized parameters are injected into scripts that follow internal programming patterns. Finally, we also leverage Airflow to create headless services inside Kubernetes for large computational workloads (Spark & H2O) that subsequent pods consume ephemerally.
One of the exciting new features in Airflow 3 is internationalization (i18n), bringing multilingual support to the UI and making Airflow more accessible to users worldwide. This talk will highlight the UI changes made to support different languages, including locale-aware adjustments. We’ll discuss how translations are contributed and managed — including the use of LLMs to accelerate the process — and why human review remains an essential part of it. We’ll present the i18n policy designed to ensure long-term maintainability, along with the tooling developed to support it. Finally, you’ll learn how to get involved and contribute to Airflow’s global reach by translating or reviewing content in your language.
As Apache Airflow adoption accelerates for data pipeline orchestration, integrating it effectively into your enterprise’s Automation Center of Excellence (CoE) is crucial for maximizing ROI, ensuring governance, and standardizing best practices. This session explores common challenges faced when bringing specialized tools like Airflow into a broader CoE framework. We’ll demonstrate how leveraging enterprise automation platforms like Automic Automation can simplify this integration by providing centralized orchestration, standardized lifecycle management, and unified auditing for Airflow DAGs alongside other enterprise workloads. Furthermore, discover how Automation Analytics & Intelligence (AAI) can offer the CoE a single pane of glass for monitoring performance, tracking SLAs, and proving the business value of Airflow initiatives within the complete automation landscape. Learn practical strategies to ensure Airflow becomes a well-governed, high-performing component of your overall automation strategy.
As Apache Airflow adoption accelerates for data pipeline orchestration, integrating it effectively into your enterprise’s Automation Center of Excellence (CoE) is crucial for maximizing ROI, ensuring governance, and standardizing best practices. This session explores common challenges faced when bringing specialized tools like Airflow into a broader CoE framework. We’ll demonstrate how leveraging enterprise automation platforms like Automic Automation can simplify this integration by providing centralized orchestration, standardized lifecycle management, and unified auditing for Airflow DAGs alongside other enterprise workloads. Furthermore, discover how Automation Analytics & Intelligence (AAI) can offer the CoE a single pane of glass for monitoring performance, tracking SLAs, and proving the business value of Airflow initiatives within the complete automation landscape. Learn practical strategies to ensure Airflow becomes a well-governed, high-performing component of your overall automation strategy.
This session will detail Allegro’s, a leading e-commerce company in Poland, journey with Apache Airflow. It will chart our evolution from a custom, on-premises Airflow-as-a-Service solution through a significant expansion to over 300 Cloud Composer instances in Google Cloud, culminating in Airflow becoming the core of our data processing. We orchestrate over 64,000 regular tasks spanning over 6,000 active DAGs on more than 200 Airflow instances. From feeding business-supporting dashboards, to managing main data marts, and handling ML pipelines, and more. We will share our practical experiences, lessons learned, and the strategies employed to manage and scale this critical infrastructure. Furthermore, we will introduce our innovative economy-of-share approach for providing ready-to-use Airflow environments, significantly enhancing both user productivity and cost efficiency.
The general-purpose nature of Airflow has always left us questioning, “Is this the right way”? While the existing resources and community cover them, the new Airflow releases always leave us wondering if there is more . This talk reveals how 3.0’s innovations redefine best practices, building production-ready data platforms. • Dag Development - Future-proof your dags without compromising on Fundamentals • Modern Pipelines: How to best incorporate new Airflow features • Infrastructure: Leveraging 3.0’s Service-Oriented Architecture and Edge Executor • Teams & Responsibilities: Streamlined operations with the new split CLI and improved UI. • Monitoring & Observability: Building fail-proof pipelines
Red Hat’s unified data and AI platform relies on Apache Airflow for orchestration, alongside Snowflake, Fivetran, and Atlan. The platform prioritizes building a dependable data foundation, recognizing that effective AI depends on quality data. Airflow was selected for its predictability, extensive connectivity, reliability, and scalability. The platform now supports business analytics, transitioning from ETL to ELT processes. This has resulted in a remarkable improvement in how we make data available for business decisions. The platform’s capabilities are being extended to power Digital Workers (AI agents) using large language models, encompassing model training, fine-tuning, and inference. Two Digital Workers are currently deployed, with more in development. This presentation will detail the rationale and background of this evolution, followed by an explanation of the architectural decisions made and the challenges encountered and resolved throughout the process of transforming into an AI-enabled data platform to power Red Hat’s business.