talk-data.com
People (372 results)
See all 372 →Companies (1 result)
Activities & events
| Title & Speakers | Event |
|---|---|
|
The State of Airflow 2026: London Airflow Meetup!
2026-01-28 · 17:30
Join fellow Airflow enthusiasts and leaders at Salisbury House for an evening of engaging talks, great food and drinks, and exclusive swag! We'll start you off with a deep dive into the Airflow 2026 survey results, and finish off with a community member presentation on the Apache TinkerPop provider. PRESENTATIONS Talk #1: The State of Apache Airflow® 2026 Apache Airflow® continues to thrive as the world’s leading open-source data orchestration platform, with 30M downloads per month and over 3k contributors. 2025 marked a major milestone with the release of Airflow 3, which introduced DAG versioning, enhanced security and task isolation, assets, and more. These changes have reshaped how data teams build, operate, and govern their pipelines. In this session, our speaker will share insights from the State of Airflow 2026 report, including:
Join us to hear directly from a leader in the community and discover how to get the most out of Airflow in the year ahead. Talk #2: Building the Apache TinkerPop Provider for Airflow
Graph databases are powering everything from recommendation engines to fraud detection, but integrating graph operations into modern data pipelines has often required custom code and workarounds. Earlier this year, Ahmad built a new Apache TinkerPop provider for Airflow, making it easier than ever to orchestrate Gremlin queries, manage graph workloads, and connect Airflow to TinkerPop-enabled systems. In this session, you’ll learn:
Join us to explore how Airflow and TinkerPop can work together to streamline graph workflows and unlock new patterns in modern data pipelines. AGENDA
|
The State of Airflow 2026: London Airflow Meetup!
|
|
Not Just for Engineers: What Agentic Data Management Unlocks
2025-12-10 · 19:00
Most conversations about automating data workflows center on the engineering stack—but that’s only part of the story. The real payoff comes downstream, where data scientists, analysts, and business leaders need fast, reliable access to data to make informed decisions. In this session, the Matillion team will demonstrate how an agentic system like Maia shifts repetitive data tasks into autonomous execution, freeing up teams to focus on high-impact analysis and modeling. We’ll walk through how this shift reduces backlog, accelerates time-to-insight, and increases trust in the data that drives both machine learning pipelines and executive dashboards. Through real-world examples, attendees will see how agentic workflows empower the entire data organization—not just engineering. Key Takeaways: 1️⃣ Workload Lift for Analysts & Scientists: How automation reduces manual prep and speeds up modeling cycles 2️⃣ From Backlog to Business Value: How agentic orchestration gets more data into production, faster 3️⃣ Organizational Trust in Data: Why agentic systems improve confidence in the data used for decision-making 4️⃣ Cross-Functional Impact: How these changes improve collaboration between data engineering, analytics, and leadership teams PANELISTS TO BE ANNOUNCED SOON |
Not Just for Engineers: What Agentic Data Management Unlocks
|
|
Why organisations struggle with change and what to do about It
2025-12-04 · 05:00
Jason Foster
– guest
,
Sunil Kumar
– Chief Transformation Officer
Most organisations don't struggle with change because of strategy or technology, they struggle because change is fundamentally human. In this episode of Hub & Spoken, Jason Foster, CEO & Founder of Cynozure, speaks with Sunil Kumar, Chief Transformation Officer, to explore why transformation so often stalls and what leaders can do to make it stick. Drawing on more than 26 years working across airlines, telecoms, finance and FMCG, Sunil explains why context, such as geopolitics, customer behaviour, industry shifts and internal culture, is the deciding factor in how change lands. When leaders ignore that context, resistance and fatigue follow. Jason and Sunil discuss the human realities behind change, including: Why people naturally resist it How values and beliefs influence adoption Why narrative and excitement matter more than familiar project metrics Sunil also shares his practical "push, pull, connect" model for building momentum and why adoption, not go-live, should be the true measure of success. 🎧 Listen to the full episode now Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation. |
Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy |
|
Hammerspace Breaks IO500 Barriers: How They Built the Fastest NFS-Based Benchmark Ever w/ Jon Flynn
2025-11-20 · 20:59
Molly Presley
– host
,
Jonathan Flynn
– Director of Applied Systems
@ Hammerspace
In this landmark 100th episode of Data Unchained, host Molly Presley sits down with Jonathan Flynn, Director of Applied Systems at Hammerspace, live from Supercomputing 2025. Together they explore the performance engineering breakthroughs that enabled Hammerspace and Samsung to deliver a historic IO500 10 Node Production result using only standard Linux, the upstream NFSv4.2 client, and off the shelf NVMe hardware. This episode breaks down how the Hammerspace Data Platform delivered more than a 33 percent gain over earlier submissions, doubled overall bandwidth, and achieved an unprecedented 809 percent improvement in the IO Hard Read test using Samsung PM1753 Gen 5 NVMe SSDs. Jonathan explains the Linux kernel innovations, metadata advancements, IO path optimization, parallel file system breakthroughs, and multi instance file placement strategies that allowed Hammerspace to reach genuine HPC class performance without proprietary clients or custom networking. Listeners get a detailed walkthrough of the architectural differences between Research and Production IO500 submissions, the impact of metadata redundancy, the performance benefits of NFSd direct and NFS direct, the role of ZFS locking improvements, and how upstream Linux contributions directly advanced the state of HPC and AI data infrastructure. Jonathan also highlights the evolution of MLPerf benchmarking, the benefits of tier zero storage, and how Hammerspace performance engineering is unlocking new levels of efficiency and scalability for AI training, scientific workloads, and large scale analytics. This episode is essential for AI architects, HPC engineers, kernel developers, data scientists, and infrastructure leaders building the next generation of high performance data platforms. Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information. |
Data Unchained
Podcast
|
|
Speaker: Bianca Stratulat Start Date: Thu, Nov 20th 2025 · 7:00 PM EEST (5:00 PM GMT) Language: ENGLISH Location: Online (link visible for attendees) =============================================================== Description: We’ll explore how to integrate Databricks and Power BI effectively, enabling your organisation to unlock real-time analytics and create impactful data stories. You’ll learn how to: Leverage the Medallion architecture to design scalable and maintainable data workflows in Databricks. Seamlessly connect Power BI to Databricks and consume data for reporting and dashboards. Apply best practices for Direct Query vs Import, balancing real-time insights with performance and scalability. Whether you’re a data engineer, analyst, or Power BI enthusiast, this session will provide practical techniques and lessons learned from real-world implementations to help you supercharge your analytics capabilities. =============================================================== At the end of the Meetup we'll have a Raffle with prizes offered by EDNA: 1 FREE one-year access Licenses on EDNA Platform for one lucky winner from the Live attendees ! =============================================================== Speaker: Bianca Stratulat Databricks Champion & Chief Data Officer at UnifEye Bianca is a Databricks Champion and Chief Data Officer at UnifEye, where she helps organisations unlock the full potential of their data using modern platforms and AI. With over 10 years of experience in data engineering, analytics, and visual storytelling, she specialises in building scalable data solutions that bridge the gap between technical teams and business leaders. She has been a speaker at the Databricks Data + AI Summit 2025 in San Francisco and regularly presents at industry events on topics like Lakehouse architecture, real-time analytics, and best practices for integrating Databricks with Power BI. Bianca is passionate about empowering data communities and helping teams turn complex data into actionable insights that drive innovation and growth. Connect with Bianca here: |
From Lakehouse to Dashboards integrate Databricks with PowerBI| Bianca Stratulat
|
|
The AI Data Paradox: High Trust in Models, Low Trust in Data
2025-11-09 · 23:53
Ariel Pohoryles
– guest
@ Rivery
,
Tobias Macey
– host
Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems, only 50% trust their organization's data overall. Ariel explains why truly productionizing AI demands broader, continuously refreshed data with stronger automation and governance, and highlights the challenges posed by unstructured data and vector stores. The conversation covers the need to shift from manual reviews to automated pipelines, the resurgence of metadata and master data management, and the importance of guardrails, traceability, and agent governance. Ariel also predicts a growing convergence between data teams and application integration teams and advises leaders to focus on high-value use cases, aggressive pipeline automation, and cataloging and governing the coming sprawl of AI agents, all while using AI to accelerate data engineering itself. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Ariel Pohoryles about data management investments that organizations are making to enable them to scale AI implementationsInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the motivation and scope of your recent survey on data management investments for AI across your respondents?What are the key takeaways that were most significant to you?The survey reveals a fascinating paradox: 77% of leaders trust the data used by their AI systems, yet only half trust their organization's overall data quality. For our data engineering audience, what does this suggest about how companies are currently sourcing data for AI? Does it imply they are using narrow, manually-curated "golden datasets," and what are the technical challenges and risks of that approach as they try to scale?The report highlights a heavy reliance on manual data quality processes, with one expert noting companies feel it's "not reliable to fully automate validation" for external or customer data. At the same time, maturity in "Automated tools for data integration and cleansing" is low, at only 42%. What specific technical hurdles or organizational inertia are preventing teams from adopting more automation in their data quality and integration pipelines?There was a significant point made that with generative AI, "biases can scale much faster," making automated governance essential. From a data engineering perspective, how does the data management strategy need to evolve to support generative AI versus traditional ML models? What new types of data quality checks, lineage tracking, or monitoring for feedback loops are required when the model itself is generating new content based on its own outputs?The report champions a "centralized data management platform" as the "connective tissue" for reliable AI. How do you see the scale and data maturity impacting the realities of that effort?How do architectural patterns in the shape of cloud warehouses, lakehouses, data mesh, data products, etc. factor into that need for centralized/unified platforms?A surprising finding was that a third of respondents have not fully grasped the risk of significant inaccuracies in their AI models if they fail to prioritize data management. In your experience, what are the biggest blind spots for data and analytics leaders?Looking at the maturity charts, companies rate themselves highly on "Developing a data management strategy" (65%) but lag significantly in areas like "Automated tools for data integration and cleansing" (42%) and "Conducting bias-detection audits" (24%). If you were advising a data engineering team lead based on these findings, what would you tell them to prioritize in the next 6-12 months to bridge the gap between strategy and a truly scalable, trustworthy data foundation for AI?The report states that 83% of companies expect to integrate more data sources for their AI in the next year. For a data engineer on the ground, what is the most important capability they need to build into their platform to handle this influx?What are the most interesting, innovative, or unexpected ways that you have seen teams addressing the new and accelerated data needs for AI applications?What are some of the noteworthy trends or predictions that you have for the near-term future of the impact that AI is having or will have on data teams and systems?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links BoomiData ManagementIntegration & Automation DemoAgentstudioData Connector Agent WebinarSurvey ResultsData GovernanceShadow ITPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
Data Engineering Podcast |
|
Building data excellence: culture, value and the human side of AI
2025-10-30 · 05:00
Jason Foster
– guest
,
Roberto Maranca
– VP of Data Excellence
@ Schneider Electric
In this episode of Hub & Spoken, Jason Foster, CEO and Founder of Cynozure, speaks with Roberto Maranca, data & digital transformation expert and author of Data Excellence. They explore what it really means to build a 'data fit' organisation, one that treats data capability like physical fitness by understanding where you are, training for where you want to be and making improvement a daily routine. Drawing from ancient philosophy and modern business, Roberto explains how concepts from Socrates and Aristotle can help leaders rethink culture, value and human responsibility in an AI-driven world. Together, they discuss how organisations can: Shift from seeing data as a tech issue to a leadership mindset Build collective intelligence and cultural readiness Stay human in the age of intelligent machines Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation. |
Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy |
|
Srinivasa Rao Bittla
– author
AI-Driven Software Testing explores how Artificial Intelligence (AI) and Machine Learning (ML) are revolutionizing quality engineering (QE), making testing more intelligent, efficient, and adaptive. The book begins by examining the critical role of QE in modern software development and the paradigm shift introduced by AI/ML. It traces the evolution of software testing, from manual approaches to AI-powered automation, highlighting key innovations that enhance accuracy, speed, and scalability. Readers will gain a deep understanding of quality engineering in the age of AI, comparing traditional and AI-driven testing methodologies to uncover their advantages and challenges. Moving into practical applications, the book delves into AI-enhanced test planning, execution, and defect management. It explores AI-driven test case development, intelligent test environments, and real-time monitoring techniques that streamline the testing lifecycle. Additionally, it covers AI’s impact on continuous integration and delivery (CI/CD), predictive analytics for failure prevention, and strategies for scaling AI-driven testing across cloud platforms. Finally, it looks ahead to the future of AI in software testing, discussing emerging trends, ethical considerations, and the evolving role of QE professionals in an AI-first world. With real-world case studies and actionable insights, AI-Driven Software Testing is an essential guide for QE engineers, developers, and tech leaders looking to harness AI for smarter, faster, and more reliable software testing. What you will learn: • What are the key principles of AI/ML-driven quality engineering • What is intelligent test case generation and adaptive test automation • Explore predictive analytics for defect prevention and risk assessment • Understand integration of AI/ML tools in CI/CD pipelines Who this book is for: Quality Engineers looking to enhance software testing with AI-driven techniques. Data Scientists exploring AI applications in software quality assurance and engineering. Software Developers – Engineers seeking to integrate AI/ML into testing and automation workflows. |
O'Reilly AI & ML Books
|
|
The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies
2025-10-18 · 22:35
Kate Shaw
– Senior Product Manager for Data and SLIM
@ SnapLogic
,
Tobias Macey
– host
Summary In this episode Kate Shaw, Senior Product Manager for Data and SLIM at SnapLogic, talks about the hidden and compounding costs of maintaining legacy systems—and practical strategies for modernization. She unpacks how “legacy” is less about age and more about when a system becomes a risk: blocking innovation, consuming excess IT time, and creating opportunity costs. Kate explores technical debt, vendor lock-in, lost context from employee turnover, and the slippery notion of “if it ain’t broke,” especially when data correctness and lineage are unclear. Shee digs into governance, observability, and data quality as foundations for trustworthy analytics and AI, and why exit strategies for system retirement should be planned from day one. The discussion covers composable architectures to avoid monoliths and big-bang migrations, how to bridge valuable systems into AI initiatives without lock-in, and why clear success criteria matter for AI projects. Kate shares lessons from the field on discovery, documentation gaps, parallel run strategies, and using integration as the connective tissue to unlock data for modern, cloud-native and AI-enabled use cases. She closes with guidance on planning migrations, defining measurable outcomes, ensuring lineage and compliance, and building for swap-ability so teams can evolve systems incrementally instead of living with a “bowl of spaghetti.” Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Kate Shaw about the true costs of maintaining legacy systemsInterview IntroductionHow did you get involved in the area of data management?What are your crtieria for when a given system or service transitions to being "legacy"?In order for any service to survive long enough to become "legacy" it must be serving its purpose and providing value. What are the common factors that prompt teams to deprecate or migrate systems?What are the sources of monetary cost related to maintaining legacy systems while they remain operational?Beyond monetary cost, economics also have a concept of "opportunity cost". What are some of the ways that manifests in data teams who are maintaining or migrating from legacy systems?How does that loss of productivity impact the broader organization?How does the process of migration contribute to issues around data accuracy, reliability, etc. as well as contributing to potential compromises of security and compliance?Once a system has been replaced, it needs to be retired. What are some of the costs associated with removing a system from service?What are the most interesting, innovative, or unexpected ways that you have seen teams address the costs of legacy systems and their retirement?What are the most interesting, unexpected, or challenging lessons that you have learned while working on legacy systems migration?When is deprecation/migration the wrong choice?How have evolutionary architecture patterns helped to mitigate the costs of system retirement?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SnapLogicSLIM == SnapLogic Intelligent ModernizerOpportunity CostSunk Cost FallacyData GovernanceEvolutionary ArchitectureThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
Data Engineering Podcast |
|
Building Enterprise AI That Works
2025-10-15 · 23:00
Topic: Building the Next Generation of Enterprise AI: From Intelligent Automation to Document Search with RAG Description: The promise of AI is here, but how do we move from hype to tangible business value? Organizations today are drowning in unstructured data and slowed by complex manual workflows. The next generation of enterprise AI offers a powerful solution, capable of not just automating tasks but understanding, reasoning, and interacting with information in unprecedented ways. Join Bibin Prathap, a Microsoft MVP for AI and a seasoned AI & Analytics Leader, for a deep dive into the practical architecture and application of modern enterprise AI. Drawing from his hands-on experience building an AI-driven workflow automation platform and a generative AI document explorer, Bibin will demystify the core technologies transforming the modern enterprise. This session will provide a technical roadmap for building impactful, scalable, and intelligent systems. What You Will Learn:
Who Should Attend: This session is designed for AI Engineers, Data Scientists, Software Architects, Developers, and Tech Leaders who are responsible for implementing AI solutions and driving digital transformation. Speak with Our Knowledgeable Advisor Access Our Complimentary Career Guide Transform Your Career with Us in Just 14 Weeks Discover More About WeCloudData ABOUT US WeCloudData is the leading accredited education institute in North America that focuses on Data Science, Data Engineering, DevOps, Artificial Intelligence, and Business Intelligence. Developed by industry experts, and hiring managers, and highly recognized by our hiring partners, WeCloudData’s learning paths have helped many students make successful transitions into data and DevOps roles that fit their backgrounds and passions. WeCloudData provides a different and more practical teaching methodology, so that students not only learn the technical skills but also acquire the soft skills that will make them stand out in a work environment. WeCloudData has also partnered with many big companies to help them adopt the latest tech in Data, AI, and DevOps. Visit our website for more information: https://weclouddata.com |
Building Enterprise AI That Works
|
|
Context Engineering as a Discipline: Building Governed AI Analytics
2025-10-11 · 21:36
Nick Schrock
– guest
,
Tobias Macey
– host
Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his journey from initial skepticism to embracing agentic AI as model and application advancements made it practical for governed workflows, and explores how Compass redefines the relationship between data teams and stakeholders by shifting analysts into steward roles, capturing and governing context, and integrating with Slack where collaboration already happens. The conversation covers organizational observability through Compass's conversational system of record, cost control strategies, and the implications of agentic collaboration on Conway's Law, as well as what's next for Compass and Nick's optimistic views on AI-accelerated software engineering. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Nick Schrock about building an AI analyst that keeps data teams in the loopInterview IntroductionHow did you get involved in the area of data management?Can you describe what Compass is and the story behind it?context repository structurehow to keep it relevant/avoid sprawl/duplicationproviding guardrailshow does a tool like Compass help provide feedback/insights back to the data teams?preparing the data warehouse for effective introspection by the AILLM selectioncost managementcaching/materializing ad-hoc queriesWhy Slack and enterprise chat are important to b2b softwareHow AI is changing stakeholder relationshipsHow not to overpromise AI capabilities How does Compass relate to BI?How does Compass relate to Dagster and Data Infrastructure?What are the most interesting, innovative, or unexpected ways that you have seen Compass used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Compass?When is Compass the wrong choice?What do you have planned for the future of Compass?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links DagsterDagster LabsDagster PlusDagster CompassChris Bergh DataOps EpisodeRise of Medium Code blog postContext EngineeringData StewardInformation ArchitectureConway's LawTemporal durable execution frameworkThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
|
|
The Data Model That Captures Your Business: Metric Trees Explained
2025-10-05 · 23:59
Vijay Subramanian
– Founder and CEO
@ Trace
,
Tobias Macey
– host
Summary In this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data practices at Rent the Runway and explains how the modern data stack has led to a proliferation of dashboards without a coherent way for business consumers to reason about cause, effect, and action. He explores how metric trees differ from and interoperate with other data modeling approaches, serve as a backend for analytical workflows, and provide concrete examples like modeling Uber's revenue drivers and customer journeys. Vijay also discusses the potential of AI agents operating on metric trees to execute workflows, organizational patterns for defining inputs and outputs with business teams, and a vision for analytics that becomes invisible infrastructure embedded in everyday decisions. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Vijay Subramanian about metric trees and how they empower more effective and adaptive analyticsInterview IntroductionHow did you get involved in the area of data management?Can you describe what metric trees are and their purpose?How do metric trees relate to metric/semantic layers?What are the shortcomings of existing data modeling frameworks that prevent effective use of those assets?How do metric trees build on top of existing investments in dimensional data models?What are some strategies for engaging with the business to identify metrics and their relationships?What are your recommendations for storage, representation, and retrieval of metric trees?How do metric trees fit into the overall lifecycle of organizational data workflows?When creating any new data asset it introduces overhead of maintenance, monitoring, and evolution. How do metric trees fit into existing testing and validation frameworks that teams rely on for dimensional modeling?What are some of the key differences in useful evaluation/testing that teams need to develop for metric trees?How do metric trees assist in context engineering for AI-powered self-serve access to organizational data?What are the most interesting, innovative, or unexpected ways that you have seen metric trees used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on metric trees and operationalizing them at Trace?When is a metric tree the wrong abstraction?What do you have planned for the future of Trace and applications of metric trees?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links Metric TreeTraceModern Data StackHadoopVerticaLuigidbtRalph KimballBill InmonMetric LayerDimensional Data WarehouseMaster Data ManagementData GovernanceFinancial P&L (Profit and Loss)EBITDA ==Earnings before interest, taxes, depreciation and amortizationThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
|
|
The future of work isn't about AI. It's about us.
2025-10-02 · 05:00
Jason Foster
– guest
AI is no longer a distant concept; it's here, reshaping the way we live and work. From coding and customer service to creative content, AI is already taking on tasks once thought to be uniquely human. But what does that mean for the future of work, and more importantly, for the role of leaders? In this solo episode of Hub & Spoken, Jason Foster, CEO and Founder of Cynozure, explores the real implications of AI on jobs, leadership, and human value. Drawing lessons from history, automation, shipping containers, even the rise of personal computing, Jason argues that every wave of technology has shifted humans "up a level of abstraction," moving us from doing to designing, to directing and innovating. He sets out four essential human traits to thrive in the age of AI: Think bigger – focus on outcomes, strategy, and imagination Lead differently – provide clarity, orchestrate teams, and build culture Connect deeper – lean into empathy, context, and trust Grow and adapt – stay curious, resilient, and open to change 🎧 Tune in to hear Jason's take on how we can design the future we want to be part of. Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation. |
Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy |
|
Beyond Analytics: The DataOps Conference
2025-09-16 · 16:00
Learn how modern DataOps is enabling advanced use cases beyond analytics in a half-day of virtual sessions where data leaders share how they leverage orchestration to power AI, ML, and production-grade data products.
WHAT TO EXPECT AT BEYOND ANALYTICS
BECOME A CERTIFIED AIRFLOW 3 EXPERT Join Airflow expert Marc Lamberti, creator of Data with Marc, for a fast-track, expert-led introduction to Apache Airflow 3 fundamentals — designed to help you confidently prepare for the official Airflow 3 certification exam. Get your questions answered live, plus you’ll get a discount code for a free certification ($150 value). FEATURED SPEAKERS
BRINGING TOGETHER THE DATA ECOSYSTEM Beyond Analytics is presented by Astronomer and supported by the following partners. ❗Please Note: you must register on the event page here to save your spot. |
Beyond Analytics: The DataOps Conference
|
|
Beyond Analytics: The DataOps Conference
2025-09-16 · 16:00
Learn how modern DataOps is enabling advanced use cases beyond analytics in a half-day of virtual sessions where data leaders share how they leverage orchestration to power AI, ML, and production-grade data products.
WHAT TO EXPECT AT BEYOND ANALYTICS
BECOME A CERTIFIED AIRFLOW 3 EXPERT Join Airflow expert Marc Lamberti, creator of Data with Marc, for a fast-track, expert-led introduction to Apache Airflow 3 fundamentals — designed to help you confidently prepare for the official Airflow 3 certification exam. Get your questions answered live, plus you’ll get a discount code for a free certification ($150 value). FEATURED SPEAKERS
BRINGING TOGETHER THE DATA ECOSYSTEM Beyond Analytics is presented by Astronomer and supported by the following partners. ❗Please Note: you must register on the event page here to save your spot. |
Beyond Analytics: The DataOps Conference
|
|
Beyond Analytics: The DataOps Conference
2025-09-16 · 16:00
Learn how modern DataOps is enabling advanced use cases beyond analytics in a half-day of virtual sessions where data leaders share how they leverage orchestration to power AI, ML, and production-grade data products.
WHAT TO EXPECT AT BEYOND ANALYTICS
BECOME A CERTIFIED AIRFLOW 3 EXPERT Join Airflow expert Marc Lamberti, creator of Data with Marc, for a fast-track, expert-led introduction to Apache Airflow 3 fundamentals — designed to help you confidently prepare for the official Airflow 3 certification exam. Get your questions answered live, plus you’ll get a discount code for a free certification ($150 value). FEATURED SPEAKERS
BRINGING TOGETHER THE DATA ECOSYSTEM Beyond Analytics is presented by Astronomer and supported by the following partners. ❗Please Note: you must register on the event page here to save your spot. |
Beyond Analytics: The DataOps Conference
|
|
Beyond Analytics: The DataOps Conference
2025-09-16 · 16:00
Learn how modern DataOps is enabling advanced use cases beyond analytics in a half-day of virtual sessions where data leaders share how they leverage orchestration to power AI, ML, and production-grade data products.
WHAT TO EXPECT AT BEYOND ANALYTICS
BECOME A CERTIFIED AIRFLOW 3 EXPERT Join Airflow expert Marc Lamberti, creator of Data with Marc, for a fast-track, expert-led introduction to Apache Airflow 3 fundamentals — designed to help you confidently prepare for the official Airflow 3 certification exam. Get your questions answered live, plus you’ll get a discount code for a free certification ($150 value). FEATURED SPEAKERS
BRINGING TOGETHER THE DATA ECOSYSTEM Beyond Analytics is presented by Astronomer and supported by the following partners. ❗Please Note: you must register on the event page here to save your spot. |
Beyond Analytics: The DataOps Conference
|
|
Beyond Analytics: The DataOps Conference
2025-09-16 · 16:00
Learn how modern DataOps is enabling advanced use cases beyond analytics in a half-day of virtual sessions where data leaders share how they leverage orchestration to power AI, ML, and production-grade data products.
WHAT TO EXPECT AT BEYOND ANALYTICS
BECOME A CERTIFIED AIRFLOW 3 EXPERT Join Airflow expert Marc Lamberti, creator of Data with Marc, for a fast-track, expert-led introduction to Apache Airflow 3 fundamentals — designed to help you confidently prepare for the official Airflow 3 certification exam. Get your questions answered live, plus you’ll get a discount code for a free certification ($150 value). FEATURED SPEAKERS
BRINGING TOGETHER THE DATA ECOSYSTEM Beyond Analytics is presented by Astronomer and supported by the following partners. ❗Please Note: you must register on the event page here to save your spot. |
Beyond Analytics: The DataOps Conference
|
|
Beyond Analytics: The DataOps Conference
2025-09-16 · 16:00
Learn how modern DataOps is enabling advanced use cases beyond analytics in a half-day of virtual sessions where data leaders share how they leverage orchestration to power AI, ML, and production-grade data products.
WHAT TO EXPECT AT BEYOND ANALYTICS
BECOME A CERTIFIED AIRFLOW 3 EXPERT Join Airflow expert Marc Lamberti, creator of Data with Marc, for a fast-track, expert-led introduction to Apache Airflow 3 fundamentals — designed to help you confidently prepare for the official Airflow 3 certification exam. Get your questions answered live, plus you’ll get a discount code for a free certification ($150 value). FEATURED SPEAKERS
BRINGING TOGETHER THE DATA ECOSYSTEM Beyond Analytics is presented by Astronomer and supported by the following partners. ❗Please Note: you must register on the event page here to save your spot. |
Beyond Analytics: The DataOps Conference
|
|
Beyond Analytics: The DataOps Conference
2025-09-16 · 16:00
Learn how modern DataOps is enabling advanced use cases beyond analytics in a half-day of virtual sessions where data leaders share how they leverage orchestration to power AI, ML, and production-grade data products.
WHAT TO EXPECT AT BEYOND ANALYTICS
BECOME A CERTIFIED AIRFLOW 3 EXPERT Join Airflow expert Marc Lamberti, creator of Data with Marc, for a fast-track, expert-led introduction to Apache Airflow 3 fundamentals — designed to help you confidently prepare for the official Airflow 3 certification exam. Get your questions answered live, plus you’ll get a discount code for a free certification ($150 value). FEATURED SPEAKERS
BRINGING TOGETHER THE DATA ECOSYSTEM Beyond Analytics is presented by Astronomer and supported by the following partners. ❗Please Note: you must register on the event page here to save your spot. |
Beyond Analytics: The DataOps Conference
|