dbt

Revolutionizing Python Notebooks with Marimo

2025-07-28 · Data Engineering Podcast Listen

podcast_episode

by Akshay Agrawal (Marimo) , Tobias Macey

AI/ML Airflow Analytics Dagster Data Engineering Data Lakehouse Data Management Datafold Oracle Prefect Python Snowflake

Summary In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. He discusses the challenges of traditional Jupyter notebooks, such as hidden states and lack of interactivity, and how Marimo addresses these issues with features like reactive execution and Python-native file formats. Akshay also explores the broader landscape of programmatic notebooks, comparing Marimo to other tools like Jupyter, Streamlit, and Hex, highlighting its unique approach to creating data apps directly from notebooks and eliminating the need for separate app development. The conversation delves into the technical architecture of Marimo, its community-driven development, and future plans, including a commercial offering and enhanced AI integration, emphasizing Marimo's role in bridging the gap between data exploration and production-ready applications.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementTired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to 6x while guaranteeing accuracy? Datafold's Migration Agent is the only AI-powered solution that doesn't just translate your code; it validates every single data point to ensure perfect parity between your old and new systems. Whether you're moving from Oracle to Snowflake, migrating stored procedures to dbt, or handling complex multi-system migrations, they deliver production-ready code with a guaranteed timeline and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they're turning months-long migration nightmares into week-long success stories.Your host is Tobias Macey and today I'm interviewing Akshay Agrawal about Marimo, a reusable and reproducible Python notebook environmentInterview IntroductionHow did you get involved in the area of data management?Can you describe what Marimo is and the story behind it?What are the core problems and use cases that you are focused on addressing with Marimo?What are you explicitly not trying to solve for with Marimo?Programmatic notebooks have been around for decades now. Jupyter was largely responsible for making them popular outside of academia. How have the applications of notebooks changed in recent years?What are the limitations that have been most challenging to address in production contexts?Jupyter has long had support for multi-language notebooks/notebook kernels. What is your opinion on the utility of that feature as a core concern of the notebook system?Beyond notebooks, Streamlit and Hex have become quite popular for publishing the results of notebook-style analysis. How would you characterize the feature set of Marimo for those use cases?For a typical data team that is working across data pipelines, business analytics, ML/AI engineering, etc. How do you see Marimo applied within and across those contexts?One of the common difficulties with notebooks is that they are largely a single-player experience. They may connect into a shared compute cluster for scaling up execution (e.g. Ray, Dask, etc.). How does Marimo address the situation where a data platform team wants to offer notebooks as a service to reduce the friction to getting started with analyzing data in a warehouse/lakehouse context?How are you seeing teams integrate Marimo with orchestrators (e.g. Dagster, Airflow, Prefect)?What are some of the most interesting or complex engineering challenges that you have had to address while building and evolving Marimo?\What are the most interesting, innovative, or unexpected ways that you have seen Marimo used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Marimo?When is Marimo the wrong choice?What do you have planned for the future of Marimo?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links MarimoJupyterIPythonStreamlitPodcast.init EpisodeVector EmbeddingsDimensionality ReductionKagglePytestPEP 723 script dependency metadataMatLabVisicalcMathematicaRMarkdownRShinyElixir LivebookDatabricks NotebooksPapermillPluto - Julia NotebookHexDirected Acyclic Graph (DAG)Sumble Kaggle founder Anthony Goldblum's startupRayDaskJupytextnbdevDuckDBPodcast EpisodeIcebergSupersetjupyter-marimo-proxyJupyterHubBinderNixAnyWidgetJupyter WidgetsMatplotlibAltairPlotlyDataFusionPolarsMotherDuckThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

How Amazon S3 works (w/ Andy Warfield)

2025-07-20 · The Analytics Engineering Podcast Listen

podcast_episode

by Tristan Handy (dbt Labs) , Andy Warfield (Amazon)

Analytics Analytics Engineering AWS Cloud Computing Iceberg S3

In this season of the Analytics Engineering podcast, Tristan is deep into the world of developer tools and databases. If you're following us here, you've almost definitely used Amazon S3 it and its Blob Storage siblings. They form the foundation for nearly all data work in the cloud. In many ways, it was the innovations that happened inside of S3 that have unlocked all of the progress in cloud data over the last decade. In this episode, Tristan talks with Andy Warfield, VP and senior principal engineer at AWS, where he focuses primarily on storage. They go deep on S3, how it works, and what it unlocks. They close out italking about Iceberg, S3 table buckets, and what this all suggests about the outlines of the S3 product roadmap moving forward. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Madison Schott - From Analytics Engineer to Content Creator

2025-07-16 · The Joe Reis Show Listen

podcast_episode

by Madison Schott , Joe Reis (DeepLearning.AI)

Analytics Data Modelling SQL

Madison Schott joins me to chat about about her journey from working as an analytics engineer to creating content about dbt, SQL, data modeling, and more.

Like a dentist 🦷- How Bergzeit manages +2000 dbt data tests on imperfect data

2025-07-15 · Munich dbt Meetup

talk

Leverage dbt docs for Genie AI/BI

2025-07-15 · Munich dbt Meetup

talk

AI/ML BI

Introducing the dbt Fusion Engine

2025-07-15 · Munich dbt Meetup

talk

Python is all you need: an overview of the composable, Python-native data stack

2025-07-09 · SciPy 2025

talk

by Deepyaman Datta

API Data Engineering Modern Data Stack Python SQL

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges existing open-source libraries like Kedro and Pandera and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.

Data Observability in dbt

2025-07-05 · Lagos dbt Meetup #8 (in-person)

talk

by Obiechina Iyi

dbt Fusion

2025-07-05 · Lagos dbt Meetup #8 (in-person)

talk

Leveraging dbt OSS for end-end Analytics Engineering

2025-07-05 · Lagos dbt Meetup #8 (in-person)

talk

by Abiodun Sanni

How to use CI/CD pipelines for dbt and deploy dbt Cloud using Infrastructure as Code + sneak peak into dbt Fusion

2025-07-01 · 22th Eindhoven Data Community Meetup | Studyportals

talk

by Koen Graat (Xebia Data)

Terraform ci/cd dbt cloud dbt fusion

Are you tired of manual and failing dbt deployments? This talk explores how CI/CD and IaC can revolutionize your data transformation workflows, enhancing collaboration and data quality within your dbt projects. Learn the core concepts of CI/CD, including automated testing and deployment pipelines, I will guide you through building a CI/CD pipeline for dbt, triggering it with code changes and running comprehensive tests. Next to that we will dive into Infrastructure as Code (IaC) and how it automates dbt Cloud deployments using tools like Terraform. You will gain practical knowledge for automating dbt Cloud resources, projects, and environments. As a bonus we will do a sneak peek into the recently announced dbt Fusion engine.

The Journey to Just-In-Time Data: Evolving our dbt Orchestration with Airflow

2025-07-01 · 22th Eindhoven Data Community Meetup | Studyportals

talk

by Armand Duijen (Studyportals) , Katie Scheitzer (Studyportals)

Airflow

We strive for our dbt project to be ready by 9am for our stakeholders. Should be easy, right? Except that our dbt project consists of around 450 dbt models and over 30 sources. Some of those sources are ready as early as midnight but some as late as 4am, and in total our project takes around 4 hours to run. Join as us we walk through the evolution of our dbt run setup, from one selector, to a set of parallel commands, to today's setup -- a dynamic lineage in Airflow which runs models when and only when the upstream source is ready. It's finished when the Tableau datasource is refreshed and our stakeholders can start their day with the latest data.