dbt

Agentic AI Automating Semantic Layer Updates with Airflow 3

2025-07-01 · Airflow Summit 2025

session

by Soren Archibald , Andres Astorga Espriella

AI/ML Airflow Looker

In today’s dynamic data environments, tables and schemas are constantly evolving and keeping semantic layers up to date has become a critical operational challenge. Manual updates don’t scale, and delays can quickly lead to broken dashboards, failed pipelines, and lost trust. We’ll show how to harness Apache Airflow 3 and its new event-driven scheduling capabilities to automate the entire lifecycle: detecting table and schema changes in real time, parsing and interpreting those changes, and shifting left the updating of semantic models across dbt, Looker, or custom metadata layers. AI agents will add intelligence and automation that rationalize schema diffs, assess impact of changes, and propose targeted updates to semantic layers reducing manual work and minimizing the risk of errors. We’ll dive into strategies for efficient change detection, safe incremental updates, and orchestrating workflows where humans collaborate with AI agents to validate and deploy changes. By the end of the session, you’ll understand how to build resilient, self-healing semantic layers that minimize downtime, reduce manual intervention, and scale effortlessly across fast-changing data environments.

Boosting dbt-core workflows performance with Airflow’s Deferrable capabilities

2025-07-01 · Airflow Summit 2025

session

by Pankaj Singh , Pankaj Koti , Tatiana Al-Chueyr Martins (Astronomer)

Airflow Astronomer Cloud Computing Cosmos GitHub

Efficiently handling long-running workflows is crucial for scaling modern data pipelines. Apache Airflow’s deferrable operators help offload tasks during idle periods — freeing worker slots while tracking progress. This session explores how Cosmos 1.9 ( https://github.com/astronomer/astronomer-cosmos ) integrates Airflow’s deferrable capabilities to enhance orchestrating dbt ( https://github.com/dbt-labs/dbt-core ) in production, with insights from recent contributions that introduced this functionality. Key takeaways: Deferrable Operators: How they work and why they’re ideal for long-running dbt tasks. Integrating with Cosmos: Refactoring and enhancements to enable deferrable behaviour across platforms. Performance Gains: Resource savings and task throughput improvements from deferrable execution. Challenges & Future Enhancements: Lessons learned, compatibility, and ideas for broader support. Whether orchestrating dbt models on a cloud warehouse or managing large-scale transformations, this session offers practical strategies to reduce resource contention and boost pipeline performance.

Driving Analytics with Open Source: Airbyte, dbt, Airflow & Metabase

2025-07-01 · Airflow Summit 2025

session

by Ayoade Adegbite

Airbyte Airflow Analytics Metabase postgresql

In this talk, I’ll walk through how we built an end-to-end analytics pipeline using open-source tools ( Airbyte, dbt, Airflow, and Metabase). At WirePick, we extract data from multiple sources using Airbyte OSS into PostgreSQL, transform it into business-specific data marts with dbt, and automate the entire workflow using Airflow. Our Metabase dashboards provide real-time insights, and we integrate Slack notifications to alert stakeholders when key business metrics change. This session will cover: Data extraction: Using Airbyte OSS to pull data from multiple sources Transformation & Modeling: How dbt helps create reusable data marts Automation & Orchestration: Managing the workflow with Airflow Data-driven decision-making: Delivering insights through Metabase & Slack alerts

Dynamic Data Pipelines with DBT and Airflow

2025-07-01 · Airflow Summit 2025

session

by Miquel Angel Andreu Febrer

Airflow CI/CD Data Quality

This session showcases Okta’s innovative approach to data pipeline orchestration with dbt and Airflow. How we’ve implemented dynamically generated airflow dags workflows based on dbt’s dependency graph. This allows us to enforce strict data quality standards by automatically executing downstream model tests before upstream model deployments, effectively preventing error cascades. The entire CI/CD pipeline, from dbt model changes to production DAG deployment, is fully automated. The result? Accelerated development cycles, reduced operational overhead, and bulletproof data reliability

Event-Driven Airflow 3.0: Real-Time Orchestration with Pub/Sub

2025-07-01 · Airflow Summit 2025

session

by Andrea Bombino , Nawfel Bacha

Airflow Cloud Computing GCP Pub/Sub

Traditional time-based scheduling in Airflow can lead to inefficiencies and delays. With Airflow 3.0, we can now leverage native event-driven DAG execution, enabling workflows to trigger instantly when data arrives—eliminating polling-based sensors and rigid schedules. This talk explores real-time orchestration using Airflow 3.0 and Google Cloud Pub/Sub. We’ll showcase how to build an event-driven pipeline where DAGs automatically trigger as new data lands, ensuring faster and more efficient processing. Through a live demo, we’ll demonstrate how Airflow listens to Pub/Sub messages and dynamically triggers dbt transformations only when fresh data is available. This approach improves scalability, reduces costs, and enhances orchestration efficiency. Key Takeaways: How event-driven DAGs work vs. traditional scheduling, Best practices for integrating Airflow with Pub/Sub,Eliminating polling-based sensors for efficiency,Live demo: Event-driven pipeline with Airflow 3.0, Pub/Sub & dbt. This session will showcase how Airflow 3.0 enables truly real-time orchestration.

How Airflow solves the coordination of decentralised teams at Vinted

2025-07-01 · Airflow Summit 2025

session

by Rodrigo Loredo , Oscar Ligthart

AI/ML Airflow CI/CD

Vinted is the biggest second-hand marketplace in Europe with multiple business verticals. Our data ecosystem has over 20 decentralized teams responsible for generating, transforming, and building Data Products from petabytes of data. This creates a daring environment where inter-team dependencies, varied expertise with scheduling tools, and diverse use cases need to be managed efficiently. To tackle these challenges, we have centralized our approach by leveraging Apache Airflow to orchestrate data dependencies across teams. In this session, we will present how we utilize a code generator to streamline the creation of Airflow code for numerous dbt repositories, dockerized jobs, and Vertex-AI pipelines. With this approach, we simplify the complexity and offer our users the flexibility required to accommodate their use cases. We will share our sensor-callback strategy, which we developed to manage task dependencies, overcoming the limitations of traditional dataset triggers. This approach requires a data asset registry to monitor global dependencies and SLOs, and serves as a safeguard during CI processes for detecting potential breaking changes.

Orchestrating Databricks with Airflow: Unlocking the Power of MVs, Streaming Tables, and AI

2025-07-01 · Airflow Summit 2025

session

by Tahir Fayyaz (/ Google Cloud Platform Team specialising in Data & Machine Learning, BigQuery expert) , Shanelle Roman

AI/ML Airflow Analytics Databricks SQL Data Streaming

As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment. In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines. Whether you’re a dbt user seeking better performance, a data engineer managing streaming pipelines, or an ML practitioner scaling AI workloads, this session will provide actionable insights on using Airflow and Databricks together to build efficient, cost-effective, and future-proof data platforms.

Orchestrating Databricks with Airflow: Unlocking the Power of MVs, Streaming Tables, and AI

2025-07-01 · Airflow Summit 2025

session

by Tahir Fayyaz (/ Google Cloud Platform Team specialising in Data & Machine Learning, BigQuery expert) , Shanelle Roman

AI/ML Airflow Analytics Databricks SQL Data Streaming

As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment. In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines. Whether you’re a dbt user seeking better performance, a data engineer managing streaming pipelines, or an ML practitioner scaling AI workloads, this session will provide actionable insights on using Airflow and Databricks together to build efficient, cost-effective, and future-proof data platforms.

Orchestrating MLOps and Data Transformation at EDB with Airflow

2025-07-01 · Airflow Summit 2025

session

by Karthik Dulam

AI/ML Airflow Analytics Analytics Engineering Azure Cosmos Data Governance Data Quality MLOps

This talk explores EDB’s journey from siloed reporting to a unified data platform, powered by Airflow. We’ll delve into the architectural evolution, showcasing how Airflow orchestrates a diverse range of use cases, from Analytics Engineering to complex MLOps pipelines. Learn how EDB leverages Airflow and Cosmos to integrate dbt for robust data transformations, ensuring data quality and consistency. We’ll provide a detailed case study of our MLOps implementation, demonstrating how Airflow manages training, inference, and model monitoring pipelines for Azure Machine Learning models. Discover the design considerations driven by our internal data governance framework and gain insights into our future plans for AIOps integration with Airflow.

Productionising dbt-core with Airflow

2025-07-01 · Airflow Summit 2025

session

by Pankaj Singh , Pankaj Koti , Tatiana Al-Chueyr Martins (Astronomer)

Airflow Analytics Analytics Engineering Astronomer Cosmos

As a popular open-source library for analytics engineering, dbt is often combined with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models. This workshop will cover a step-by-step guide to Cosmos , a popular open-source package from Astronomer that helps you quickly run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through: Running and visualising your dbt transformations Managing dependency conflicts Defining database credentials (profiles) Configuring source and test nodes Using dbt selectors Customising arguments per model Addressing performance challenges Leveraging deferrable operators Visualising dbt docs in the Airflow UI Example of how to deploy to production Troubleshooting We encourage participants to bring their dbt project to follow this step-by-step workshop.

Simplifying Data Lineage: How OpenLineage Empowers Airflow and Beyond

2025-07-01 · Airflow Summit 2025

session

by Julien Le Dem (Astronomer) , Harel Shein (Datadog)

Airflow Flink Spark

OpenLineage has simplified collecting lineage metadata across the data ecosystem by standardizing its representation in an extensible model. It enabled a whole ecosystem improving data pipeline reliability and ease of troubleshooting in production environments. In this talk, we’ll briefly introduce the OpenLineage model and explore how this metadata is collected from Airflow, Spark, dbt, and Flink. We’ll demonstrate how to extract valuable insights and outline practical benefits and common challenges when building ingestion, processing and storage for OpenLineage data. We will also briefly show how OpenLineage events can be used to observe data pipelines exhastively and the benefits that brings.

talk-data.com

Activity Trend

Top Events

Top Speakers

Agentic AI Automating Semantic Layer Updates with Airflow 3

Boosting dbt-core workflows performance with Airflow’s Deferrable capabilities

Driving Analytics with Open Source: Airbyte, dbt, Airflow & Metabase

Dynamic Data Pipelines with DBT and Airflow

Event-Driven Airflow 3.0: Real-Time Orchestration with Pub/Sub

How Airflow solves the coordination of decentralised teams at Vinted

Orchestrating Databricks with Airflow: Unlocking the Power of MVs, Streaming Tables, and AI

Orchestrating Databricks with Airflow: Unlocking the Power of MVs, Streaming Tables, and AI

Orchestrating MLOps and Data Transformation at EDB with Airflow

Productionising dbt-core with Airflow

Simplifying Data Lineage: How OpenLineage Empowers Airflow and Beyond