talk-data.com talk-data.com

Topic

Modern Data Stack

298

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

298 activities · Newest first

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges open-source libraries like Kedro, Pandera, and the Boring Semantic Layer and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.

In this episode, Ciro Greco (Co-founder & CEO, Bauplan) joins me to discuss why the future of data infrastructure must be "Code-First" and how this philosophy accidentally created the perfect environment for AI Agents.

We explore why the "Modern Data Stack" isn't ready for autonomous agents and why a programmable lakehouse is the solution. Ciro explains that while we trust agents to write code (because we can roll it back), allowing them to write data requires strict safety rails.

He breaks down how Bauplan uses "Git for Data" semantics - branching, isolation, and transactionality - to provide an air-gapped sandbox where agents can safely operate without corrupting production data. Welcome to the future of the lakehouse.

Bauplan: https://www.bauplanlabs.com/

There's no shortage of technical content for data engineers, but a massive gap exists when it comes to the non-technical skills required to advance beyond a senior role. I sit down with Yordan Ivanov, Head of Data Engineering and writer of "Data Gibberish," to talk about this disconnect. We dive into his personal journey of failing as a manager the first time, learning the crucial "people" skills, and his current mission to help data engineers learn how to speak the language of business. Key areas we explore: The Senior-Level Content Gap: Yordan explains why his non-technical content on career strategy and stakeholder communication gets "terrible" engagement compared to technical posts, even though it's what's needed to advance.The Managerial Trap: Yordan's candid story about his first attempt at management, where he failed because he cared only about code and wasn't equipped for the people-centric aspects and politics of the role.The Danger of AI Over-reliance: A deep discussion on how leaning too heavily on AI can prevent the development of fundamental thinking and problem-solving skills, both in coding and in life.The Maturing Data Landscape: We reflect on the end of the "modern data stack euphoria" and what the wave of acquisitions means for innovation and the future of data tooling.AI Adoption in Europe vs. the US: A look at how AI adoption is perceived as massive and mandatory in Europe, while US census data shows surprisingly low enterprise adoption rates

Ryan Dolley, VP of Product Strategy at GoodData and co-host of Super Data Brothers podcast, joined Yuliia and Dumke to discuss the DBT-Fivetran merger and what it signals about the modern data stack's consolidation phase. After 16 years in BI and analytics, Ryan explains why BI adoption has been stuck at 27% for a decade and why simply adding AI chatbots won't solve it. He argues that at large enterprises, purchasing new software is actually the only viable opportunity to change company culture - not because of the features, but because it forces operational pauses and new ways of working. Ryan shares his take that AI will struggle with BI because LLMs are trained to give emotionally satisfying answers rather than accurate ones. Ryan Dolley linkedin

Step into a dynamic, interactive session where you'll experience the data transformation journey from multiple angles: Data Engineer, Manager, Analytics VP, and Chief Data Officer. This immersive tabletop exercise isn’t your typical panel or demo—it’s a high-empathy, scenario-driven experience designed to build cross-role understanding and alignment across the modern data stack. Each scene drops you into a real-world challenge—whether it's data trust issues, managing cost pressures, or preparing for an AI initiative—and forces a go/no-go decision with your peers. You’ll explore how your choices impact others across the org, from the technical trenches to the boardroom. Whether you're a practitioner, leader, or executive, this session will help you see data not just as pipelines and dashboards, but as a 360-degree opportunity to drive business change. Walk away with a clear picture of the capabilities your team needs (without naming products) and a roadmap for building champions across your org.

Join LTIMindtree and Scania to explore how the modern data stack—powered by cloud services, and a data product mindset—transforms raw data into governed, reusable, and business-aligned assets. By applying domain-driven design and Data Mesh principles, organizations can decentralize ownership, align data products with business needs, and foster scalable collaboration. This approach accelerates decision-making, strengthens data-driven strategies, and delivers measurable business outcomes

EQT, a global investment organization specializing in private capital, infrastructure, and real assets, has transformed its data operations by fully adopting the modern data stack. As a cloud-native company with hundreds of internal and external data sources — from YouTube to Google Cloud Storage — EQT needed a scalable, centralized solution to ingest and transform data for complex financial use cases. Their journey took them from fragmented, Excel-based workflows to a robust, integrated data pipeline powered by Fivetran.

In this session, you’ll learn how:

•EQT streamlined external data ingestion and broke down data silos •How a unified data pipeline supports scalable financial analytics and decision-making •Fivetran’s ease of use, connector maintenance, and cost-effectiveness made it the clear choice

All Response Media, Europe’s number-one performance media agency, turned a fragmented reporting process into a competitive advantage by combining Snowflake, Domo, and AI Agents. What once took 2.5 days now takes a single hour, with campaign insights delivered in real time and usage up 230 percent across the agency. In this session, discover how All Response Media built a scalable modern data stack, introduced Agents that translate data into client-ready insights, and created transparency that gives their global brand customers a genuine edge.

Découvrez comment Stellantis a relevé le défi de la qualité des données à grande échelle. Après la fusion de PSA et FCA, le constructeur a mis en place une Modern Data Stack (Snowflake, dltHub et dbt) pour unifier ses données et en garantir l'excellence.Vous plongerez dans l'architecture technique mise en oeuvre et vous découvrirez l'impact concret de ce projet sur des cas d'usage clés, tels que l'optimisation de la chaîne logistique et la réduction des coûts. Une opportunité unique de découvrir les meilleures pratiques et les enseignements tirés d'un projet de cette envergure.

Summary In this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data practices at Rent the Runway and explains how the modern data stack has led to a proliferation of dashboards without a coherent way for business consumers to reason about cause, effect, and action. He explores how metric trees differ from and interoperate with other data modeling approaches, serve as a backend for analytical workflows, and provide concrete examples like modeling Uber's revenue drivers and customer journeys. Vijay also discusses the potential of AI agents operating on metric trees to execute workflows, organizational patterns for defining inputs and outputs with business teams, and a vision for analytics that becomes invisible infrastructure embedded in everyday decisions.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Vijay Subramanian about metric trees and how they empower more effective and adaptive analyticsInterview IntroductionHow did you get involved in the area of data management?Can you describe what metric trees are and their purpose?How do metric trees relate to metric/semantic layers?What are the shortcomings of existing data modeling frameworks that prevent effective use of those assets?How do metric trees build on top of existing investments in dimensional data models?What are some strategies for engaging with the business to identify metrics and their relationships?What are your recommendations for storage, representation, and retrieval of metric trees?How do metric trees fit into the overall lifecycle of organizational data workflows?When creating any new data asset it introduces overhead of maintenance, monitoring, and evolution. How do metric trees fit into existing testing and validation frameworks that teams rely on for dimensional modeling?What are some of the key differences in useful evaluation/testing that teams need to develop for metric trees?How do metric trees assist in context engineering for AI-powered self-serve access to organizational data?What are the most interesting, innovative, or unexpected ways that you have seen metric trees used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on metric trees and operationalizing them at Trace?When is a metric tree the wrong abstraction?What do you have planned for the future of Trace and applications of metric trees?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links Metric TreeTraceModern Data StackHadoopVerticaLuigidbtRalph KimballBill InmonMetric LayerDimensional Data WarehouseMaster Data ManagementData GovernanceFinancial P&L (Profit and Loss)EBITDA ==Earnings before interest, taxes, depreciation and amortizationThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

It's all about acquisitions, acquisitions, acquisitions! Matt Housley joins me to tackle the biggest rumor in the data world this week: the potential acquisition of dbt Labs by Fivetran. This news sparks a wide-ranging discussion on the inevitable consolidation of the Modern Data Stack, a trend we predicted as the era of zero-interest-rate policy ended. We also talk about financial pressures, vendor exposure to the rise of AI, the future of data tooling, and more.

Les enjeux de souveraineté des data sont de plus en plus au cœur des préoccupations des DSI et des Directions Générales. Dans le même temps, les concepts de Modern Data Stack -plutôt basés sur des technologies cloud- apportent une efficience remarquable pour les équipes IT au service des métiers.

Durant notre session, nous montrerons comment nous pouvons concilier performance de la MDS et enjeux de souveraineté.

Allianz replaced its legacy campaign system with a cloud-native Campaign Data Hub (CDH) powered by Snowflake, unifying data from three core business lines onto a single, real-time platform. This modern architecture reduces costs and frees up IT resources, while empowering over 8,000 agents with on-demand customer insights and the ability to pivot campaign messaging in minutes. The result is a strategic shift from legacy complexity to data-driven growth, enabling Allianz to launch hyper-personalized campaigns at scale and drive a sharp increase in agent productivity and conversion rates.

Découvrez comment GLS suit, analyse et optimise plus d’1 million de colis chaque jour grâce à une Modern Data Stack performante et une BI pensée pour l’efficacité.

Vous verrez concrètement comment transformer des flux complexes en décisions claires et rapides, au service de la performance opérationnelle.

Repartez avec des insights concrets sur :

- Les outils et méthodes pour fiabiliser vos données en temps réel

- Les bonnes pratiques de data visualisation pour piloter à grande échelle

- L’impact business d’une gouvernance data efficace

Une session inspirante et à ne pas manquer si vous voulez booster votre pilotage grâce à la data.

Toucan est la solution d’Embedded Analytics qui simplifie l’accès à la donnée et aide les entreprises à prendre de meilleures décisions grâce à des dashboards clairs, rapides à déployer et accessibles à tous.

GLS, acteur majeur du transport de colis en Europe, s’appuie sur la data pour garantir chaque jour fiabilité, performance et qualité de service à ses millions de clients.

Unlocking dbt: Design and Deploy Transformations in Your Cloud Data Warehouse

Master the art of data transformation with the second edition of this trusted guide to dbt. Building on the foundation of the first edition, this updated volume offers a deeper, more comprehensive exploration of dbt’s capabilities—whether you're new to the tool or looking to sharpen your skills. It dives into the latest features and techniques, equipping you with the tools to create scalable, maintainable, and production-ready data transformation pipelines. Unlocking dbt, Second Edition introduces key advancements, including the semantic layer, which allows you to define and manage metrics at scale, and dbt Mesh, empowering organizations to orchestrate decentralized data workflows with confidence. You’ll also explore more advanced testing capabilities, expanded CI/CD and deployment strategies, and enhancements in documentation—such as the newly introduced dbt Catalog. As in the first edition, you’ll learn how to harness dbt’s power to transform raw data into actionable insights, while incorporating software engineering best practices like code reusability, version control, and automated testing. From configuring projects with the dbt Platform or open source dbt to mastering advanced transformations using SQL and Jinja, this book provides everything you need to tackle real-world challenges effectively. What You Will Learn Understand dbt and its role in the modern data stack Set up projects using both the cloud-hosted dbt Platform and open source project Connect dbt projects to cloud data warehouses Build scalable models in SQL and Python Configure development, testing, and production environments Capture reusable logic with Jinja macros Incorporate version control with your data transformation code Seamlessly connect your projects using dbt Mesh Build and manage a semantic layer using dbt Deploy dbt using CI/CD best practices Who This Book Is For Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline’s transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges existing open-source libraries like Kedro and Pandera and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.

Data teams have a bad habit: reinventing the wheel. Despite the explosion of open-source tooling, best practices, and managed services, teams still find themselves building bespoke data platforms from scratch—often hitting the same roadblocks as those before them. Why does this keep happening, and more importantly, how can we break the cycle? In this talk, we’ll unpack the key reasons data teams default to building rather than adopting, from technical nuances to cultural and organizational dynamics. We’ll discuss why fragmentation in the modern data stack, the pressure to “own” infrastructure, and the allure of in-house solutions make this problem so persistent. Using real-world examples, we’ll explore strategies to help data teams focus on delivering business value rather than endlessly rebuilding foundational infrastructure. Whether you’re an engineer, a data leader, or an open-source contributor, this session will provide insights into navigating the build-vs-buy tradeoff more effectively.