Airflow

Project Update

2025-08-08 · Airflow Monthly Virtual Town Hall- August

talk

Overview of the latest Airflow project updates, milestones, and roadmap.

Project Update

2025-08-08 · Airflow Monthly Virtual Town Hall- August

talk

Airflow Town Hall – 2025-08-08

Python and the Data Platform in 2025

2025-07-31 · PyData Cardiff - July 2025

talk

by James C Yarrow (Future Publishing)

dbt Python

This will be a walkthrough of the modern data platform, the capabilities and the challenges Python is solving and how other tools like Airflow and DBT play a role in the modern data platform.

Revolutionizing Python Notebooks with Marimo

2025-07-28 · Data Engineering Podcast Listen

podcast_episode

by Akshay Agrawal (Marimo) , Tobias Macey

AI/ML Analytics Dagster Data Engineering Data Lakehouse Data Management Datafold dbt Oracle Prefect Python Snowflake

Summary In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. He discusses the challenges of traditional Jupyter notebooks, such as hidden states and lack of interactivity, and how Marimo addresses these issues with features like reactive execution and Python-native file formats. Akshay also explores the broader landscape of programmatic notebooks, comparing Marimo to other tools like Jupyter, Streamlit, and Hex, highlighting its unique approach to creating data apps directly from notebooks and eliminating the need for separate app development. The conversation delves into the technical architecture of Marimo, its community-driven development, and future plans, including a commercial offering and enhanced AI integration, emphasizing Marimo's role in bridging the gap between data exploration and production-ready applications.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementTired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to 6x while guaranteeing accuracy? Datafold's Migration Agent is the only AI-powered solution that doesn't just translate your code; it validates every single data point to ensure perfect parity between your old and new systems. Whether you're moving from Oracle to Snowflake, migrating stored procedures to dbt, or handling complex multi-system migrations, they deliver production-ready code with a guaranteed timeline and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they're turning months-long migration nightmares into week-long success stories.Your host is Tobias Macey and today I'm interviewing Akshay Agrawal about Marimo, a reusable and reproducible Python notebook environmentInterview IntroductionHow did you get involved in the area of data management?Can you describe what Marimo is and the story behind it?What are the core problems and use cases that you are focused on addressing with Marimo?What are you explicitly not trying to solve for with Marimo?Programmatic notebooks have been around for decades now. Jupyter was largely responsible for making them popular outside of academia. How have the applications of notebooks changed in recent years?What are the limitations that have been most challenging to address in production contexts?Jupyter has long had support for multi-language notebooks/notebook kernels. What is your opinion on the utility of that feature as a core concern of the notebook system?Beyond notebooks, Streamlit and Hex have become quite popular for publishing the results of notebook-style analysis. How would you characterize the feature set of Marimo for those use cases?For a typical data team that is working across data pipelines, business analytics, ML/AI engineering, etc. How do you see Marimo applied within and across those contexts?One of the common difficulties with notebooks is that they are largely a single-player experience. They may connect into a shared compute cluster for scaling up execution (e.g. Ray, Dask, etc.). How does Marimo address the situation where a data platform team wants to offer notebooks as a service to reduce the friction to getting started with analyzing data in a warehouse/lakehouse context?How are you seeing teams integrate Marimo with orchestrators (e.g. Dagster, Airflow, Prefect)?What are some of the most interesting or complex engineering challenges that you have had to address while building and evolving Marimo?\What are the most interesting, innovative, or unexpected ways that you have seen Marimo used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Marimo?When is Marimo the wrong choice?What do you have planned for the future of Marimo?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links MarimoJupyterIPythonStreamlitPodcast.init EpisodeVector EmbeddingsDimensionality ReductionKagglePytestPEP 723 script dependency metadataMatLabVisicalcMathematicaRMarkdownRShinyElixir LivebookDatabricks NotebooksPapermillPluto - Julia NotebookHexDirected Acyclic Graph (DAG)Sumble Kaggle founder Anthony Goldblum's startupRayDaskJupytextnbdevDuckDBPodcast EpisodeIcebergSupersetjupyter-marimo-proxyJupyterHubBinderNixAnyWidgetJupyter WidgetsMatplotlibAltairPlotlyDataFusionPolarsMotherDuckThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Building Lineage Observability at Scale

2025-07-03 · Amsterdam Apache Airflow® Meetup at Booking.com!

talk

by Jeroen Schmidt (Booking.com)

Observability in data workflows often stops at logs and metrics, leaving data lineage as a blind spot. At Booking, we set out to change that by treating lineage as a core observability layer. In this talk, I'll walk through how we integrated lineage tracking into our Airflow ecosystem, what metadata we capture, and how we surface it to users in a meaningful way. I'll also share how lineage data helps us debug failures, detect unexpected changes, and ensure compliance. You'll leave with a practical view of what it takes to make lineage not just visible, but actionable.

Building multi-tenant Airflow at scale: From Chaos to Control

2025-07-03 · Amsterdam Apache Airflow® Meetup at Booking.com!

talk

by Omid Karami (Booking.com)

YAML

Running Airflow at scale for thousands of workflows across multiple teams introduces challenges around standardization, governance, and isolation. At Booking, we've built a multi-tenant Airflow platform that serves over 4,000 workflows using a custom DSL defined in workflow.yaml files. In this talk, I'll show how we use automated DAG generation to bring structure to complexity, how we achieved horizontal scalability by decoupling orchestration from execution, and how reusable step templates help us enforce governance--without sacrificing workflow isolation. You'll leave with a blueprint for taming Airflow at scale.

Deepdive into event-driven scheduling with Airflow 3

2025-07-03 · Amsterdam Apache Airflow® Meetup at Booking.com!

talk

by Bas Harenslak (Astronomer)

S3

Historically Airflow was only capable of time-based scheduling, where a DAG would run at certain times. For data updates at varying times, such as an external party delivering data to an S3 bucket, that meant having to run a DAG and continuously poll for updates. Airflow 3 introduces event-driven scheduling that enables you to trigger DAGs based on such updates. In this talk I'll demonstrate how this changes your DAG's code and how this works internally in Airflow. Lastly, I'll demonstrate a practical use case that leverages Airflow 3's event-driven scheduling.

The Journey to Just-In-Time Data: Evolving our dbt Orchestration with Airflow

2025-07-01 · 22th Eindhoven Data Community Meetup | Studyportals

talk

by Armand Duijen (Studyportals) , Katie Scheitzer (Studyportals)

dbt

We strive for our dbt project to be ready by 9am for our stakeholders. Should be easy, right? Except that our dbt project consists of around 450 dbt models and over 30 sources. Some of those sources are ready as early as midnight but some as late as 4am, and in total our project takes around 4 hours to run. Join as us we walk through the evolution of our dbt run setup, from one selector, to a set of parallel commands, to today's setup -- a dynamic lineage in Airflow which runs models when and only when the upstream source is ready. It's finished when the Tableau datasource is refreshed and our stakeholders can start their day with the latest data.