It's Friday! Matt Housley and I catch up to discuss the aftermath of AWS re:Invent and why the industry’s obsession with AI Agents might be premature. We also dive deep into the hardware wars between Google and NVIDIA , the "brain-damaged" nature of current LLMs , and the growing "enshittification" of the internet and platforms like LinkedIn. Plus, I reveals some details about my upcoming "Mixed Model Arts" project.
talk-data.com
Topic
AI/ML
Artificial Intelligence/Machine Learning
144
tagged
Activity Trend
Top Events
In this episode, I sit down with Mark Freeman and Chad Sanderson (Gable.ai) to discuss the release of their new O’Reilly book, Data Contracts: Developing Production-Grade Pipelines at Scale. They dive deep into the chaotic journey of writing a 350-page book while simultaneously building a venture-backed startup. The conversation takes a sharp turn into the evolution of Data Contracts. While the concept started with data engineers, Mark and Chad explain why they pivoted their focus to software engineers. They argue that software engineers are facing a "Data Lake Moment, "prioritizing speed over craftsmanship, resulting in massive technical debt and integration failures.
Gable: https://www.gable.ai/
In this episode, Ciro Greco (Co-founder & CEO, Bauplan) joins me to discuss why the future of data infrastructure must be "Code-First" and how this philosophy accidentally created the perfect environment for AI Agents.
We explore why the "Modern Data Stack" isn't ready for autonomous agents and why a programmable lakehouse is the solution. Ciro explains that while we trust agents to write code (because we can roll it back), allowing them to write data requires strict safety rails.
He breaks down how Bauplan uses "Git for Data" semantics - branching, isolation, and transactionality - to provide an air-gapped sandbox where agents can safely operate without corrupting production data. Welcome to the future of the lakehouse.
Bauplan: https://www.bauplanlabs.com/
Data engineering is undergoing a fundamental shift. In this episode, I sit down with Nick Schrock, founder and CTO of Dagster, to discuss why he went from being an "AI moderate" to believing 90% of code will be written by AI. Being hands on also led to a massive pivot in Dagster’s roadmap and a new focus on managing and engineering context. We dive deep into why simply feeding data to LLMs isn't enough. Nick explains why real-time context tools (like MCPs) can become "token hogs" that lack precision and why the future belongs to "context pipelines": offline, batch-computed context that is governed, versioned, and treated like code. We also explore Compass, Dagster’s new collaborative agent that lives in Slack, bridging the gap between business stakeholders and data teams. If you’re wondering how your role as a data engineer will evolve in an agentic world, this conversation maps out the territory Dagster: dagster.io Nick Schrock on X: @schrockn
The days of easy entry into data jobs over. Maggie Wolff joins the show to discuss the new reality of the data career landscape. We dive into why the bar is higher than ever and why "cold DMing" on LinkedIn is a terrible strategy.Maggie also breaks down her secret strategy for networking as an introvert: treating events like a game or role-playing a more extroverted friend. Plus, we discuss the rise of AI in education, the problem with "lazy" learning , and why companies replacing humans with AI are making a mistake.
Matt Housley joins me for our monthly round-up of topics. This time, there's danger everywhere - The AI Bubble, how vibe coding is evolving, AI slop, and more.
After 1,500+ conversations with CDOs and VPs of data , guest Malcolm Hawker noticed a disturbing pattern: a "limiting mindset" that causes data leaders to fail. He argues that too many leaders blame external factors such as "culture" , "data literacy", or a lack of support rather than taking accountability for delivering value. In this conversation, Malcolm breaks down how this mindset is reinforced by the analyst and consultant community and why it leads to a "value fatigue" where no one can prove their own ROI. He offers a clear path forward, starting with a simple 3-question framework for any new CDO and explains why "culture" is actually an outcome of delivering value, not a prerequisite for it. We also discuss his new book, "The Data Hero Playbook," tackle the "AI Ready" myth , explaining why conflating it with "BI Ready" is holding companies back and why your data is likely "good enough" to start right now.
In this conversation, Dr. Cecilia Dones and I discuss the social skills we're losing as AI becomes more integrated into our lives. We explore the erosion of social norms, from AI companions joining Zoom calls without consent, endless enshitified content, to my son's generation calling AI girlfriends "clankers".Is there hope? We break down the "rage currency" that dominates media and the positive AI stories that go unheard. The biggest takeaway: as the world becomes more synthetic, "showing up" in person will become the ultimate "premium value."
In conversations I've been having with leaders and practitioners, there's some open-ended questions about the impact of AI on vendors and open-source projects. If you don't have a moat, you need to start thinking about how AI coding tools will erode the edges of your product. And what about getting users and traction? I cover this and much more in this episode. Enjoy!
For years, data engineering was a story of predictable "pipelines": move data from point A to point B. But AI just hit the reset button on our entire field. Now, we're all staring into the void, wondering what's next. While the fundamentals haven't changed, data remains challenging in the traditional areas of data governance, data management, and data modeling, which still present challenges. Everything else is up for grabs. This talk will cut through the noise and explore the future of data engineering in an AI-driven world. We'll examine how team structures will evolve, why agentic workflows and real-time systems are becoming non-negotiable, and how our focus must shift from building dashboards and analytics to architecting for automated action. The reset button has been pushed. It's time for us to invent the future of our industry.
Sujay Dutta and Sidd Rajagopal, authors of "Data as the Fourth Pillar," join the show to make the compelling case that for C-suite leaders obsessed with AI, data must be elevated to the same level as people, process, and technology. They provide a practical playbook for Chief Data Officers (CDOs) to escape the "cost center" trap by focusing on the "demand side" (business value) instead of just the "supply side" (technology). They also introduce frameworks like "Data Intensity" and "Total Addressable Value (TAV)" for data. We also tackle the reality of AI "slopware" and the "Great Pacific garbage patch" of junk data , explaining how to build the critical "context" (or "Data Intelligence Layer") that most GenAI projects are missing. Finally, they explain why the CDO must report directly to the CEO to play "offense," not defense.
Matt Turck (VC at FirstMark) joins the show to break down the most controversial MAD (Machine Learning, AI, and Data) Landscape yet. This year, the team "declared bankruptcy" and cut over 1,000 logos to better reflect the market reality: a "Cambrian explosion" of AI companies and a fierce "struggle and tension between the very large companies and the startups".
Matt discusses why incumbents are "absolutely not lazy" , which categories have "largely just gone away" (like Customer Data Platforms and Reverse ETL) , and what new categories (like AI Agents and Local AI) are emerging. We also cover his investment thesis in a world dominated by foundation models, the "very underestimated" European AI scene , and whether an AI could win a Nobel Prize by 2027.
https://www.mattturck.com/mad2025
Jeremiah Lowin, founder of Prefect , returns to the show to discuss the seismic shift in the data and AI landscape since our last conversation a few years ago. He shares the wild origin story of FastMCP, a project he started to create a more "Pythonic" wrapper for Anthropic's Model Context Protocol (MCP).
Jeremiah explains how this side project was incorporated into Anthropic's official SDK and then exploded to over a million downloads a day after MCP gained support from OpenAI and Google. He clarifies why this is an complementary expansion for Prefect, not a pivot , and provides a simple analogy for MCP as the "USB-C for AI agents". Most surprisingly, Jeremiah reveals that the primary adoption of MCP isn't for external products, but internally by data teams who are using it to finally fulfill the promise of the self-serve semantic layer and create a governable, "LLM-free zone" for AI tools.
There's no shortage of technical content for data engineers, but a massive gap exists when it comes to the non-technical skills required to advance beyond a senior role. I sit down with Yordan Ivanov, Head of Data Engineering and writer of "Data Gibberish," to talk about this disconnect. We dive into his personal journey of failing as a manager the first time, learning the crucial "people" skills, and his current mission to help data engineers learn how to speak the language of business. Key areas we explore: The Senior-Level Content Gap: Yordan explains why his non-technical content on career strategy and stakeholder communication gets "terrible" engagement compared to technical posts, even though it's what's needed to advance.The Managerial Trap: Yordan's candid story about his first attempt at management, where he failed because he cared only about code and wasn't equipped for the people-centric aspects and politics of the role.The Danger of AI Over-reliance: A deep discussion on how leaning too heavily on AI can prevent the development of fundamental thinking and problem-solving skills, both in coding and in life.The Maturing Data Landscape: We reflect on the end of the "modern data stack euphoria" and what the wave of acquisitions means for innovation and the future of data tooling.AI Adoption in Europe vs. the US: A look at how AI adoption is perceived as massive and mandatory in Europe, while US census data shows surprisingly low enterprise adoption rates
The world of data is being reset by AI, and the infrastructure needs to evolve with it. I sit down with streaming legend Tyler Akidau to discuss how the principles of stream processing are forming the foundation for the next generation of "agentic AI" systems. Tyler, who was an AI cynic until recently, explains why he's now convinced that AI agents will fundamentally change how businesses operate and what problems we need to solve to deploy them safely. Key topics we explore: From Human Analytics to Agentic Systems: How data architectures built for human analysis must be re-imagined for a world with thousands of AI agents operating at machine speed.Auditing Everything: Why managing AI requires a new level of governance where we must record all data an agent touches, not just metadata, to diagnose its complex and opaque behaviorThe End of Windowing's Dominance: Tyler reflects on the influential Dataflow paper he co-authored and explains why he now sees a table-based abstraction as a more powerful and user-friendly model than focusing on windowing.The D&D Alignment of AI: Tyler's brilliant analogy for why enterprises are struggling to adopt AI: we're trying to integrate "chaotic" agents into systems built for "lawful good" employees.A Reset for the Industry: Why the rise of AI feels like the early 2010s of streaming, where the problems are unsolved and everyone is trying to figure out the answers.
Are dashboards dead? For complex enterprise use cases, the answer might be yes. In this episode, I'm joined by Irina Malkova (VP Data & AI at Salesforce), to discuss her team's transformational journey from building complex dashboards to deploying AI-powered conversational agents. We dive deep into how this shift is not just a change in tooling, but a fundamental change in how users access insights and how data teams measure their impact.
Join us as we cover: The Shift from Dashboards to Agents: We discuss why dashboards can create a high cognitive load and fail users in complex scenarios , and how conversational agents in the flow of work (like Slack) provide targeted, actionable insights and boost adoption.What is Product Telemetry?: Irina explains how telemetry is evolving from a simple engineering observability use case to a critical data source for AI, machine learning, and recommendation systems.Why Standard RAG Fails in the Enterprise: Irina shares why typical RAG approaches break down on dense, entity-rich corporate data (like Salesforce's help docs) where semantic similarity isn't enough, leading to the rise of Graph RAG.The New, Measurable ROI of Data: How moving from BI to agents allows data teams to precisely measure impact, track downstream actions, and finally have a concrete answer to the ROI question that was previously impossible to justify.Data Teams as Enterprise Leaders: Why data teams are uniquely positioned to lead AI transformation, as they hold the enterprise "ontology" and have experience building products under uncertainty.
It's all about acquisitions, acquisitions, acquisitions! Matt Housley joins me to tackle the biggest rumor in the data world this week: the potential acquisition of dbt Labs by Fivetran. This news sparks a wide-ranging discussion on the inevitable consolidation of the Modern Data Stack, a trend we predicted as the era of zero-interest-rate policy ended. We also talk about financial pressures, vendor exposure to the rise of AI, the future of data tooling, and more.
In this episode, I sit down with Saket Saurabh (CEO of Nexla) to discuss the fundamental shift happening in the AI landscape. The conversation is moving beyond the race to build the biggest foundational models and towards a new battleground: context. We explore what it means to be a "model company" versus a "context company" and how this changes everything for data strategy and enterprise AI.
Join us as we cover: Model vs. Context Companies: The emerging divide between companies building models (like OpenAI) and those whose advantage lies in their unique data and integrations. The Limits of Current Models: Why we might be hitting an asymptote with the current transformer architecture for solving complex, reliable business processes. "Context Engineering": What this term really means, from RAG to stitching together tools, data, and memory to feed AI systems. The Resurgence of Knowledge Graphs: Why graph databases are becoming critical for providing deterministic, reliable information to probabilistic AI models, moving beyond simple vector similarity. AI's Impact on Tooling: How tools like Lovable and Cursor are changing workflows for prototyping and coding, and the risk of creating the "-10x engineer." The Future of Data Engineering: How the field is expanding as AI becomes the primary consumer of data, requiring a new focus on architecture, semantics, and managing complexity at scale.
For years, data engineering was a story of predictable pipelines: move data from point A to point B. But AI just hit the reset button on our entire field. Now, we're all staring into the void, wondering what's next. While the fundamentals haven't changed, data remains challenging in the traditional areas of data governance, data management, and data modeling, which still present challenges. Everything else is up for grabs.
This talk will cut through the noise and explore the future of data engineering in an AI-driven world. We'll examine how team structures will evolve, why agentic workflows and real-time systems are becoming non-negotiable, and how our focus must shift from building dashboards and analytics to architecting for automated action. The reset button has been pushed. It's time for us to invent the future of our industry.
Practicing analytics well takes more than just tools and tech. It requires data modeling practices that unify and empower all teams within analytics, from engineers to analysts. This is especially true as AI becomes a part of analytics. Without a governed data model that provides consistent data interpretation, AI tools are left to guess. Join panelists Joe Reis, Sarah Levy, Harry Gollop, Rob Hulme, Shachar Meir, and Guy Fighel, as they share battle-tested advice on overcoming conflicting definitions and accurately mapping business intent to data, reports and dashboards at scale. This panel is for data & analytics engineers seeking a clear framework to capture business logic across layers, and for data leaders focused on building a reliable foundation for Gen AI.