Bright Ideas, Bold Bets: Entrepreneurship, Agentic AI, and Career Gold with Lighthouse CEO Ben Lowe. {Replay}

2025-12-03 · Making Data Simple Listen

podcast_episode

by Ben Lowe (Lighthouse Technology) , Al Martin (IBM)

AI/ML GTM IBM

Send us a text Hit replay on this high-energy conversation with Ben Lowe, Founder and CEO of Lighthouse Technology, as he breaks down what it really takes to build something from scratch in a fast-changing AI world. From finding the right co‑founder and stepping into the CEO role, to navigating the toughest challenges of entrepreneurship, Ben shares practical lessons and hard‑won insights for builders at every stage. Ben also lifts the curtain on Lighthouse Technology’s mission at the intersection of cloud modernisation, cost optimisation, and Agentic AI, and how enterprises can turn AI hype into outcomes with the right GTM motion and architecture. If you care about innovation, AI for the enterprise, or your own startup journey, this replay is packed with career gold, strategic perspective, and a candid look at what’s next. 01:34 Finding a Entrepreneurial Partner 04:43 The Creation of Lighthouse 08:10 Becoming the CEO 09:34 The Biggest Challenge of Entrepreneurship 11:30 Lighthouse Technology 19:07 Agentic AI 20:16 The Lighthouse Trajectory 22:12 The GTM 31:39 Reseller versus Innovator 36:53 AI for the Enterprise 41:35 The 2-Min Pitch 43:30 What's True?LinkedIn: https://www.linkedin.com/in/lowebenjamin/?originalSubdomain=uk Website: https://lighthousetechnology.ai/

Entrepreneurship #AgenticAI #TechLeadership #Innovation #BusinessGrowth #CareerAdvice #AIForEnterprise #StartupJourney #MakingDataSimple #LighthouseTechnology #PodcastReplay

Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

From Full-Time Mom to Head of Data and Cloud - Xia He-Bleinagel

2025-11-28 · DataTalks.Club Listen

podcast_episode

by Xia He-Bleinagel (NOW GmbH)

AI/ML Data Engineering GitHub HTML

In this talk, Xia He-Bleinagel, Head of Data & Cloud at NOW GmbH, shares her remarkable journey from studying automotive engineering across Europe to leading modern data, cloud, and engineering teams in Germany. We dive into her transition from hands-on engineering to leadership, how she balanced family with career growth, and what it really takes to succeed in today’s cloud, data, and AI job market.

TIMECODES: 00:00 Studying Automotive Engineering Across Europe 08:15 How Andrew Ng Sparked a Machine Learning Journey 11:45 Import–Export Work as an Unexpected Career Boos t17:05 Balancing Family Life with Data Engineering Studies 20:50 From Data Engineer to Head of Data & Cloud 27:46 Building Data Teams & Tackling Tech Debt 30:56 Learning Leadership Through Coaching & Observation 34:17 Management vs. IC: Finding Your Best Fit 38:52 Boosting Developer Productivity with AI Tools 42:47 Succeeding in Germany’s Competitive Data Job Market 46:03 Fast-Track Your Cloud & Data Career 50:03 Mentorship & Supporting Working Moms in Tech 53:03 Cultural & Economic Factors Shaping Women’s Careers 57:13 Top Networking Groups for Women in Data 1:00:13 Turning Domain Expertise into a Data Career Advantage

Connect with Xia- Linkedin - https://www.linkedin.com/in/xia-he-bleinagel-51773585/ - Github - https://github.com/Data-Think-2021 - Website - https://datathinker.de/

Connect with DataTalks.Club: - Join the community - https://datatalks.club/slack.html - Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ - Check other upcoming events - https://lu.ma/dtc-events - GitHub: https://github.com/DataTalksClub - LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

Blurring Lines: Data, AI, and the New Playbook for Team Velocity

2025-11-24 · Data Engineering Podcast Listen

podcast_episode

by Maxime Beauchemin (Preset) , Tobias Macey

AI/ML Data Engineering Data Management Data Quality Datafold dbt ETL/ELT Git Prefect Python SQL Data Streaming

Summary In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams build data and AI systems. He digs into the shifting boundary between data and AI engineering, the rise of “context as code,” and how just‑in‑time retrieval via MCP and CLIs lets agents gather what they need without bloating context windows. Max shares hard‑won practices from going “AI‑first” for most tasks, where humans focus on orchestration and taste, and the new bottlenecks that appear — code review, QA, async coordination — when execution accelerates 2–10x. He also dives deep into Agor, his open‑source agent orchestration platform: a spatial, multiplayer workspace that manages Git worktrees and live dev environments, templatizes prompts by workflow zones, supports session forking and sub‑sessions, and exposes an internal MCP so agents can schedule, monitor, and even coordinate other agents.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Maxime Beauchemin about the impact of multi-player multi-agent engineering on individual and team velocity for building better data systemsInterview IntroductionHow did you get involved in the area of data management?Can you start by giving an overview of the types of work that you are relying on AI development agents for?As you bring agents into the mix for software engineering, what are the bottlenecks that start to show up?In my own experience there are a finite number of agents that I can manage in parallel. How does Agor help to increase that limit?How does making multi-agent management a multi-player experience change the dynamics of how you apply agentic engineering workflows?Contact Info LinkedInLinks AgorApache AirflowApache SupersetPresetClaude CodeCodexPlaywright MCPTmuxGit WorktreesOpencode.aiGitHub CodespacesOnaThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Episode 261: 🇳🇱 C++ Under the Sea 🇳🇱 Bernhard, Koen & C++26 Reflection!

2025-11-21 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Koen (NV Michel Van de Wiele) , Bernhard (NVIDIA) , Bryce Adelstein Lelbach (NVIDIA) , Ben Deane

Computer Science C++ GitHub

In this episode, Conor and Bryce record live from C++ Under the Sea! We interview Bernhard, Koen, talk about C++26 Reflection and more! Link to Episode 261 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Socials ADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonBryce Adelstein Lelbach: TwitterAbout the Guests: Bernhard is a senior system software engineer at NVIDIA, where he extends, optimizes and maintains the CUDA Core Compute Libraries (CCCL). Previously, he worked as software engineer among physicists at CERN on real-time and embedded software for the Large Hadron Collider, as well as data layout abstractions for heterogeneous architectures, for which he received a PhD in High Performance Computing from the University of Dresden, Germany. Before, he implemented GPU accelerated simulations and 3D visualizations of industrial machining processes. Since 2022, Bernhard is a voting member of WG21 and his interests span geometry, 3D visualizations, optimization, SIMD, GPU computing, refactoring and teaching C++. Koen is an engineer specializing in high-quality software with a strong mathematical foundation. With a PhD in Computer Science from KU Leuven, his work bridges applied mathematics and performance-critical software engineering. As Team Lead for HMI Software at NV Michel Van de Wiele, he focuses on developing C++/Qt applications for textile production systems, optimizing performance, usability, and cloud integration. Passionate about elegant, efficient solutions, Koen brings deep expertise in numerical methods, system optimization, and software architecture. Show Notes Date Recorded: 2025-10-10 Date Released: 2025-11-21 Thrust DocsCUB LibraryC++26 Reflection ProposalADSP Episode 39: How Steve Jobs Saved Sean ParentParrotParrot on GitHubSean's C++ Under the Sea KeynoteParrot sumIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

How Do You Build AI That Actually Works 99.999999% of the Time? With Tina Tarquinio Chief Product Officer of IBM Z and LinuxONE

2025-11-19 · Making Data Simple Listen

podcast_episode

by Tina Tarquinio (IBM Z and LinuxONE) , Al Martin (IBM)

AI/ML DevOps IBM Linux

Send us a text Dive into the powerful world of mainframes! Chief Product Officer of IBM Z and LinuxONE, Tina Tarquinio, reveals the truth behind those eight nines of uptime and explores how mainframes are evolving with AI, hybrid cloud, and future-proofing strategies for mission-critical business decisions.

Discover the cutting-edge innovations transforming enterprise computing—from on-chip AIU and Spyre AI accelerators enabling real-time inferencing at transaction speed, to how LinuxONE is redefining hybrid cloud architecture. Tina discusses DevOps integration, AI-powered code assistants revolutionizing mainframe development, compelling AI use cases, and shares her bold predictions for the mainframe’s next 100 years. Plus, career advice from a tech leader and what she does for fun! 00:46 Tina Tarquinio03:18 The Most Mainframe Surprise09:12 What IS the Mainframe Really? 8 Nines!14:40 On Chip AIU, Spyre Inferencing18:11 Mainframes with Hybrid Cloud19:11 The Linux One Pitch19:59 Exciting Mainframe Innovations22:09 DevOps23:36 Code Assistants26:03 AI Use Case27:49 Future Proofing Decisions37:17 Regulations38:45 Bold Prediction38:58 Mainframe 10040:48 Career Advice42:24 For FunLinkedIn: linkedin.com/in/tina-tarquinio Website: https://www.ibm.com/products/z

MakingDataSimple #IBMz #Mainframe #LinuxONE #AIInferencing #SpyreAccelerator #HybridCloud #EnterpriseAI #DevOps #AICodeAssistant #EightNines #TinaTarquinio #MainframeModernization #AIUChip #FutureProofing #TechLeadership #WatsonxCodeAssistant #CloudComputing #TelumII

Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

State, Scale, and Signals: Rethinking Orchestration with Durable Execution

2025-11-16 · Data Engineering Podcast Listen

podcast_episode

by Preeti Somal (Temporal) , Tobias Macey

AI/ML Airflow Dagster Data Engineering Data Management Data Quality Datafold dbt ETL/ELT Prefect Python RAG +2 more

Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and error‑handling scaffolding while letting data remain where it lives. Preeti shares real-world patterns for replacing DAG-first orchestration, integrating application and data teams through signals and Nexus for cross-boundary calls, and using Temporal to coordinate long-running, human-in-the-loop, and agentic AI workflows with full observability and auditability. Shee also discusses heuristics for choosing Temporal alongside (or instead of) traditional orchestrators, managing scale without moving large datasets, and lessons from running durable execution as a cloud service.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Preeti Somal about how to incorporate durable execution and state management into AI application architectures Interview IntroductionHow did you get involved in the area of data management?Can you describe what durable execution is and how it impacts system architecture?With the strong focus on state maintenance and high reliability, what are some of the most impactful ways that data teams are incorporating tools like Temporal into their work?One of the core primitives in Temporal is a "workflow". How does that compare to similar primitives in common data orchestration systems such as Airflow, Dagster, Prefect, etc.? What are the heuristics that you recommend when deciding which tool to use for a given task, particularly in data/pipeline oriented projects? Even if a team is using a more data-focused orchestration engine, what are some of the ways that Temporal can be applied to handle the processing logic of the actual data?AI applications are also very dependent on reliable data to be effective in production contexts. What are some of the design patterns where durable execution can be integrated into RAG/agent applications?What are some of the conceptual hurdles that teams experience when they are starting to adopt Temporal or other durable execution frameworks?What are the most interesting, innovative, or unexpected ways that you have seen Temporal/durable execution used for data/AI services?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Temporal?When is Temporal/durable execution the wrong choice?What do you have planned for the future of Temporal for data and AI systems? Contact Info LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story. Links TemporalDurable ExecutionFlinkMachine Learning EpochSpark StreamingAirflowDirected Acyclic Graph (DAG)Temporal NexusTensorZeroAI Engineering Podcast Episode The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Great Products are Grounded in Deep Empathy with Chris Mchenry

2025-11-10 · SaaS Scaled - Interviews about SaaS Startups, Analytics, & Operations Listen

podcast_episode

by Chris McHenry (Aviatrix)

AI/ML Cyber Security

Today, we’re joined by Chris McHenry, Chief Product Officer at Aviatrix, a cloud native network security company. We talk about: Prerequisites to driving operational efficiency with agentic AIBridging the gap between security & engineering so organizations can go fast & be secure What’s required in order for agentic AI to create a magical momentWith cloud powering so much of our society, the need to get security right The security challenges introduced by agentic AI apps, including new attack vectors

The AI Data Paradox: High Trust in Models, Low Trust in Data

2025-11-09 · Data Engineering Podcast Listen

podcast_episode

by Ariel Pohoryles (Rivery) , Tobias Macey

AI/ML Analytics Data Engineering Data Management Data Quality Datafold dbt ETL/ELT GenAI Marketing Master Data Management Prefect +3 more

Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems, only 50% trust their organization's data overall. Ariel explains why truly productionizing AI demands broader, continuously refreshed data with stronger automation and governance, and highlights the challenges posed by unstructured data and vector stores. The conversation covers the need to shift from manual reviews to automated pipelines, the resurgence of metadata and master data management, and the importance of guardrails, traceability, and agent governance. Ariel also predicts a growing convergence between data teams and application integration teams and advises leaders to focus on high-value use cases, aggressive pipeline automation, and cataloging and governing the coming sprawl of AI agents, all while using AI to accelerate data engineering itself.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Ariel Pohoryles about data management investments that organizations are making to enable them to scale AI implementationsInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the motivation and scope of your recent survey on data management investments for AI across your respondents?What are the key takeaways that were most significant to you?The survey reveals a fascinating paradox: 77% of leaders trust the data used by their AI systems, yet only half trust their organization's overall data quality. For our data engineering audience, what does this suggest about how companies are currently sourcing data for AI? Does it imply they are using narrow, manually-curated "golden datasets," and what are the technical challenges and risks of that approach as they try to scale?The report highlights a heavy reliance on manual data quality processes, with one expert noting companies feel it's "not reliable to fully automate validation" for external or customer data. At the same time, maturity in "Automated tools for data integration and cleansing" is low, at only 42%. What specific technical hurdles or organizational inertia are preventing teams from adopting more automation in their data quality and integration pipelines?There was a significant point made that with generative AI, "biases can scale much faster," making automated governance essential. From a data engineering perspective, how does the data management strategy need to evolve to support generative AI versus traditional ML models? What new types of data quality checks, lineage tracking, or monitoring for feedback loops are required when the model itself is generating new content based on its own outputs?The report champions a "centralized data management platform" as the "connective tissue" for reliable AI. How do you see the scale and data maturity impacting the realities of that effort?How do architectural patterns in the shape of cloud warehouses, lakehouses, data mesh, data products, etc. factor into that need for centralized/unified platforms?A surprising finding was that a third of respondents have not fully grasped the risk of significant inaccuracies in their AI models if they fail to prioritize data management. In your experience, what are the biggest blind spots for data and analytics leaders?Looking at the maturity charts, companies rate themselves highly on "Developing a data management strategy" (65%) but lag significantly in areas like "Automated tools for data integration and cleansing" (42%) and "Conducting bias-detection audits" (24%). If you were advising a data engineering team lead based on these findings, what would you tell them to prioritize in the next 6-12 months to bridge the gap between strategy and a truly scalable, trustworthy data foundation for AI?The report states that 83% of companies expect to integrate more data sources for their AI in the next year. For a data engineer on the ground, what is the most important capability they need to build into their platform to handle this influx?What are the most interesting, innovative, or unexpected ways that you have seen teams addressing the new and accelerated data needs for AI applications?What are some of the noteworthy trends or predictions that you have for the near-term future of the impact that AI is having or will have on data teams and systems?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links BoomiData ManagementIntegration & Automation DemoAgentstudioData Connector Agent WebinarSurvey ResultsData GovernanceShadow ITPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Bridging the AI–Data Gap: Collect, Curate, Serve

2025-11-02 · Data Engineering Podcast Listen

podcast_episode

by Ido Bronstein (Upriver) , Omri Lifshitz (Upriver) , Tobias Macey

AI/ML Data Engineering Data Management Data Quality Datafold dbt ETL/ELT LLM Prefect Python RAG SQL +1 more

Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle layer" of curation, semantics, and serving. Omri and Ido outline a three-part framework for making data usable by LLMs and agents: collect, curate, serve, and share challenges of scaling from POCs to production, including compounding error rates and reliability concerns. They also explore organizational shifts, patterns for managing context windows, pragmatic views on schema choices, and Upriver's approach to building autonomous data workflows using determinism and LLMs at the right boundaries. The conversation concludes with a look ahead to AI-first data platforms where engineers supervise business semantics while automation stitches technical details end-to-end.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Omri Lifshitz and Ido Bronstein about the challenges of keeping up with the demand for data when supporting AI systemsInterview IntroductionHow did you get involved in the area of data management?We're here to talk about "The Growing Gap Between Data & AI". From your perspective, what is this gap, and why do you think it's widening so rapidly right now?How does this gap relate to the founding story of Upriver? What problems were you and your co-founders experiencing that led you to build this?The core premise of new AI tools, from RAG pipelines to LLM agents, is that they are only as good as the data they're given. How does this "garbage in, garbage out" problem change when the "in" is not a static file but a complex, high-velocity, and constantly changing data pipeline?Upriver is described as an "intelligent agent system" and an "autonomous data engineer." This is a fascinating "AI to solve for AI" approach. Can you describe this agent-based architecture and how it specifically works to bridge that data-AI gap?Your website mentions a "Data Context Layer" that turns "tribal knowledge" into a "machine-usable mode." This sounds critical for AI. How do you capture that context, and how does it make data "AI-ready" in a way that a traditional data catalog or quality tool doesn't?What are the most innovative or unexpected ways you've seen companies trying to make their data "AI-ready"? And where are the biggest points of failure you observe?What has been the most challenging or unexpected lesson you've learned while building an AI system (Upriver) that is designed to fix the data foundation for other AI systems?When is an autonomous, agent-based approach not the right solution for a team's data quality problems? What organizational or technical maturity is required to even start closing this data-AI gap?What do you have planned for the future of Upriver? And looking more broadly, how do you see this gap between data and AI evolving over the next few years?Contact Info Ido - LinkedInOmri - LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UpriverRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeAI AgentContext WindowModel Finetuning)The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

181 - Lessons Learned Designing Orion, Gravity’s AI, AI Analyst Product with CEO Lucas Thelosen (former Head of Product @ Google Data & AI Cloud)

2025-10-28 · Experiencing Data w/ Brian T. O’Neill (AI & data product management leadership—powered by UX design) Listen

podcast_episode

by Brian O’Neill (Designing for Analytics) , Lucas Thelosen (Gravity)

AI/ML Analytics Dashboard Looker

On today's Promoted Episode of Experiencing Data, I’m talking with Lucas Thelosen, CEO of Gravity and creator of Orion, an AI analyst transforming how data teams work. Lucas was head of PS for Looker, and eventually became Head of Product for Google’s Data and AI Cloud prior to starting his own data product company. We dig into how his team built Orion, the challenge of keeping AI accurate and trustworthy when doing analytical work, and how they’re thinking about the balance of human control with automation when their product acts as a force multiplier for human analysts.

In addition to talking about the product, we also talk about how Gravity arrived at specific enough use cases for this technology that a market would be willing to pay for, and how they’re thinking about pricing in today’s more “outcomes-based” environment.

Incidentally, one thing I didn’t know when I first agreed to consider having Gravity and Lucas on my show was that Lucas has been a long-time proponent of data product management and operating with a product mindset. In this episode, he shares the “ah-hah” moment where things clicked for him around building data products in this manner. Lucas shares how pivotal this moment was for him, and how it helped accelerate his career from Looker to Google and now Gravity.

If you’re leading a data team, you’re a forward-thinking CDO, or you’re interested in commercializing your own analytics/AI product, my chat with Lucas should inspire you!

Highlights/ Skip to:

Lucas’s breakthrough came when he embraced a data product management mindset (02:43) How Lucas thinks about Gravity as being the instrumentalists in an orchestra, conducted by the user (4:31) Finding product-market fit by solving for a common analytics pain point (8:11) Analytics product and dashboard adoption challenges: why dashboards die and thinking of analytics as changing the business gradually (22:25) What outcome-based pricing means for AI and analytics (32:08) The challenge of defining guardrails and ethics for AI-based analytics products [just in case somebody wants to “fudge the numbers”] (46:03) Lucas’ closing thoughts about what AI is unlocking for analysts and how to position your career for the future (48:35)

Special Bonus for DPLC Community Members Are you a member of the Data Product Leadership Community? After our chat, I invited Lucas to come give a talk about his journey of moving from “data” to “product” and adopting a producty mindset for analytics and AI work. He was more than happy to oblige. Watch for this in late 2025/early 2026 on our monthly webinar and group discussion calendar.

Note: today’s episode is one of my rare Promoted Episodes. Please help support the show by visiting Gravity’s links below:

Quotes from Today’s Episode “The whole point of data and analytics is to help the business evolve. When your reports make people ask new questions, that’s a win. If the conversations today sound different than they did three months ago, it means you’ve done your job, you’ve helped move the business forward.” — Lucas

“Accuracy is everything. The moment you lose trust, the business, the use case, it's all over. Earning that trust back takes a long time, so we made accuracy our number one design pillar from day one.” — Lucas

“Language models have changed the game in terms of scale. Suddenly, we’re facing all these new kinds of problems, not just in AI, but in the old-school software sense too. Things like privacy, scalability, and figuring out who’s responsible.” — Brian

“Most people building analytics products have never been analysts, and that’s a huge disadvantage. If data doesn’t drive action, you’ve missed the mark. That’s why so many dashboards die quickly.” — Lucas

“Re: collecting feedback so you know if your UX is good: I generally agree that qualitative feedback is the best place to start, not analytics [on your analytics!] Especially in UX, analytics measure usage aspects of the product, not the subject human experience. Experience is a collection of feelings and perceptions about how something went.” — Brian

Links

Gravity: https://www.bygravity.com LinkedIn: https://www.linkedin.com/in/thelosen/ Email Lucas and team: [email protected]

#328 The Challenges of Enterprise Agentic AI with Manasi Vartak, Chief AI Architect at Cloudera

2025-10-27 · DataFramed Listen

podcast_episode

by Manasi Vartak (Cloudera) , Richie (DataCamp)

AI/ML Computer Science Data Governance GenAI Microsoft MLOps RAG Cyber Security

The promise of AI in enterprise settings is enormous, but so are the privacy and security challenges. How do you harness AI's capabilities while keeping sensitive data protected within your organization's boundaries? Private AI—using your own models, data, and infrastructure—offers a solution, but implementation isn't straightforward. What governance frameworks need to be in place? How do you evaluate non-deterministic AI systems? When should you build in-house versus leveraging cloud services? As data and software teams evolve in this new landscape, understanding the technical requirements and workflow changes is essential for organizations looking to maintain control over their AI destiny. Manasi Vartak is Chief AI Architect and VP of Product Management (AI Platform) at Cloudera. She is a product and AI leader with more than a decade of experience at the intersection of AI infrastructure, enterprise software, and go-to-market strategy. At Cloudera, she leads product and engineering teams building low-code and high-code generative AI platforms, driving the company’s enterprise AI strategy and enabling trusted AI adoption across global organizations. Before joining Cloudera through its acquisition of Verta, Manasi was the founder and CEO of Verta, where she transformed her MIT research into enterprise-ready ML infrastructure. She scaled the company to multi-million ARR, serving Fortune 500 clients in finance, insurance, and capital markets, and led the launch of enterprise MLOps and GenAI products used in mission-critical workloads. Manasi earned her PhD in Computer Science from MIT, where she pioneered model management systems such as ModelDB — foundational work that influenced the development of tools like MLflow. Earlier in her career, she held research and engineering roles at Twitter, Facebook, Google, and Microsoft. In the episode, Richie and Manasi explore AI's role in financial services, the challenges of AI adoption in enterprises, the importance of data governance, the evolving skills needed for AI development, the future of AI agents, and much more. Links Mentioned in the Show: ClouderaCloudera Evolve ConferenceCloudera Agent StudioConnect with ManasiCourse: Introduction to AI AgentsRelated Episode: RAG 2.0 and The New Era of RAG Agents with Douwe Kiela, CEO at Contextual AI & Adjunct Professor at Stanford UniversityRewatch RADAR AI New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access

2025-10-27 · Data Engineering Podcast Listen

podcast_episode

by Matt Topper (UberEther) , Tobias Macey

AI/ML Data Engineering Data Governance Data Management Data Quality Datafold dbt ETL/ELT JSON Prefect Python Cyber Security +2 more

Summary In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems, integration burdens have exploded, fracturing governance and auditability across warehouses, lakes, files, vector stores, and streaming systems. Matt shares practical solutions, including propagating user identity via JWTs, externalizing policy with engines like OPA/Rego and Cedar, and using database proxies for native row/column security. He also explores catalog-driven governance, lineage-based label propagation, and OpenTDF for binding policies to data objects. The conversation covers machine-to-machine access, short-lived credentials, workload identity, and constraining access by interface choke points, as well as lessons from Zanzibar-style policy models and the human side of enforcement. Matt emphasizes the need for trust composition - unifying provenance, policy, and identity context - to answer questions about data access, usage, and intent across the entire data path.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Matt Topper about the challenges of managing identity and access controls in the context of data systemsInterview IntroductionHow did you get involved in the area of data management?The data ecosystem is a uniquely challenging space for creating and enforcing technical controls for identity and access control. What are the key considerations for designing a strategy for addressing those challenges?For data acess the off-the-shelf options are typically on either extreme of too coarse or too granular in their capabilities. What do you see as the major factors that contribute to that situation?Data governance policies are often used as the primary means of identifying what data can be accesssed by whom, but translating that into enforceable constraints is often left as a secondary exercise. How can we as an industry make that a more manageable and sustainable practice?How can the audit trails that are generated by data systems be used to inform the technical controls for identity and access?How can the foundational technologies of our data platforms be improved to make identity and authz a more composable primitive?How does the introduction of streaming/real-time data ingest and delivery complicate the challenges of security controls?What are the most interesting, innovative, or unexpected ways that you have seen data teams address ICAM?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ICAM?What are the aspects of ICAM in data systems that you are paying close attention to?What are your predictions for the industry adoption or enforcement of those controls?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UberEtherJWT == JSON Web TokenOPA == Open Policy AgentRegoPingIdentityOktaMicrosoft EntraSAML == Security Assertion Markup LanguageOAuthOIDC == OpenID ConnectIDP == Identity ProviderKubernetesIstioAmazon CEDAR policy languageAWS IAMPII == Personally Identifiable InformationCISO == Chief Information Security OfficerOpenTDFOpenFGAGoogle ZanzibarRisk Management FrameworkModel Context ProtocolGoogle Data ProjectTPM == Trusted Platform ModulePKI == Public Key InfrastructurePassskeysDuckLakePodcast EpisodeAccumuloJDBCOpenBaoHashicorp VaultLDAPThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Look Back: Accelerating Access to Data w/ Harry Carr

2025-10-24 · Data Unchained

podcast_episode

by Molly Presley , Harry Carr (Vicinity)

AI/ML

As we count down to the 100th episode of Data Unchained, we’re revisiting one of the conversations that perfectly captures the spirit of this show: how data mobility is transforming business. In this look-back episode, host Molly Presley welcomes Harry Carr, CEO of Vcinity, for a deep dive into the technology that’s redefining how enterprises access and move data across distributed environments. Harry explains why hybrid cloud exists, how Vcinity accelerates data access without duplication or compression, and why the future of data architecture lies in making data available anywhere—instantly. From connecting global AI workflows to eliminating the need to move massive datasets, this episode explores what true “data anti-gravity” looks like and how it’s reshaping the modern enterprise. Listen as Molly and Harry discuss the evolution of data architectures, the synergy between Hammerspace and Vcinity, and what it means to build a world where applications and data connect seamlessly, no matter where they live. Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies

2025-10-18 · Data Engineering Podcast Listen

podcast_episode

by Kate Shaw (SnapLogic) , Tobias Macey

AI/ML Analytics Data Engineering Data Management Data Quality Datafold ETL/ELT Prefect Python Cyber Security Data Streaming

Summary In this episode Kate Shaw, Senior Product Manager for Data and SLIM at SnapLogic, talks about the hidden and compounding costs of maintaining legacy systems—and practical strategies for modernization. She unpacks how “legacy” is less about age and more about when a system becomes a risk: blocking innovation, consuming excess IT time, and creating opportunity costs. Kate explores technical debt, vendor lock-in, lost context from employee turnover, and the slippery notion of “if it ain’t broke,” especially when data correctness and lineage are unclear. Shee digs into governance, observability, and data quality as foundations for trustworthy analytics and AI, and why exit strategies for system retirement should be planned from day one. The discussion covers composable architectures to avoid monoliths and big-bang migrations, how to bridge valuable systems into AI initiatives without lock-in, and why clear success criteria matter for AI projects. Kate shares lessons from the field on discovery, documentation gaps, parallel run strategies, and using integration as the connective tissue to unlock data for modern, cloud-native and AI-enabled use cases. She closes with guidance on planning migrations, defining measurable outcomes, ensuring lineage and compliance, and building for swap-ability so teams can evolve systems incrementally instead of living with a “bowl of spaghetti.”

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Kate Shaw about the true costs of maintaining legacy systemsInterview IntroductionHow did you get involved in the area of data management?What are your crtieria for when a given system or service transitions to being "legacy"?In order for any service to survive long enough to become "legacy" it must be serving its purpose and providing value. What are the common factors that prompt teams to deprecate or migrate systems?What are the sources of monetary cost related to maintaining legacy systems while they remain operational?Beyond monetary cost, economics also have a concept of "opportunity cost". What are some of the ways that manifests in data teams who are maintaining or migrating from legacy systems?How does that loss of productivity impact the broader organization?How does the process of migration contribute to issues around data accuracy, reliability, etc. as well as contributing to potential compromises of security and compliance?Once a system has been replaced, it needs to be retired. What are some of the costs associated with removing a system from service?What are the most interesting, innovative, or unexpected ways that you have seen teams address the costs of legacy systems and their retirement?What are the most interesting, unexpected, or challenging lessons that you have learned while working on legacy systems migration?When is deprecation/migration the wrong choice?How have evolutionary architecture patterns helped to mitigate the costs of system retirement?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SnapLogicSLIM == SnapLogic Intelligent ModernizerOpportunity CostSunk Cost FallacyData GovernanceEvolutionary ArchitectureThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Google’s engineering culture

2025-10-15 · The Pragmatic Engineer Listen

podcast_episode

by Gergely Orosz , Elin Nilsson (The Pragmatic Engineer)

Analytics GCP Kubernetes LLM Marketing

Brought to You By: •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. Something interesting is happening with the latest generation of tech giants. Rather than building advanced experimentation tools themselves, companies like Anthropic, Figma, Notion and a bunch of others… are just using Statsig. Statsig has rebuilt this entire suite of data tools that was available at maybe 10 or 15 giants until now. Check out Statsig. •⁠ Linear – The system for modern product development. Linear is just so fast to use – and it enables velocity in product workflows. Companies like Perplexity and OpenAI have already switched over, because simplicity scales. Go ahead and check out Linear and see why it feels like a breeze to use. — What is it really like to be an engineer at Google? In this special deep dive episode, we unpack how engineering at Google actually works. We spent months researching the engineering culture of the search giant, and talked with 20+ current and former Googlers to bring you this deepdive with Elin Nilsson, tech industry researcher for The Pragmatic Engineer and a former Google intern. Google has always been an engineering-driven organization. We talk about its custom stack and tools, the design-doc culture, and the performance and promotion systems that define career growth. We also explore the culture that feels built for engineers: generous perks, a surprisingly light on-call setup often considered the best in the industry, and a deep focus on solving technical problems at scale. If you are thinking about applying to Google or are curious about how the company’s engineering culture has evolved, this episode takes a clear look at what it was like to work at Google in the past versus today, and who is a good fit for today’s Google. Jump to interesting parts: (13:50) Tech stack (1:05:08) Performance reviews (GRAD) (2:07:03) The culture of continuously rewriting things — Timestamps (00:00) Intro (01:44) Stats about Google (11:41) The shared culture across Google (13:50) Tech stack (34:33) Internal developer tools and monorepo (43:17) The downsides of having so many internal tools at Google (45:29) Perks (55:37) Engineering roles (1:02:32) Levels at Google (1:05:08) Performance reviews (GRAD) (1:13:05) Readability (1:16:18) Promotions (1:25:46) Design docs (1:32:30) OKRs (1:44:43) Googlers, Nooglers, ReGooglers (1:57:27) Google Cloud (2:03:49) Internal transfers (2:07:03) Rewrites (2:10:19) Open source (2:14:57) Culture shift (2:31:10) Making the most of Google, as an engineer (2:39:25) Landing a job at Google — The Pragmatic Engineer deepdives relevant for this episode: •⁠ Inside Google’s engineering culture •⁠ Oncall at Google •⁠ Performance calibrations at tech companies •⁠ Promotions and tooling at Google •⁠ How Kubernetes is built •⁠ The man behind the Big Tech comics: Google cartoonist Manu Cornet — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Saving Watts for GPUs? Put DPUs to work with Power Efficient AI Storage! w/ Ted Weatherford

2025-10-09 · Data Unchained

podcast_episode

by Ted Weatherford (Xsight Labs) , Molly Presley

AI/ML

Welcome to another episode of Data Unchained hosted by Molly Presley. In this episode, Molly sits down with Ted Weatherford, VP of Business Development at Xsight Labs, to explore the cutting-edge future of data infrastructure. They dive deep into the rise of DPUs (Data Processing Units), how they compare to GPUs, and why 800G networking could change the balance of supercomputing forever. From AI data centers and neo cloud architectures to the Open Flash Platform initiative, this conversation uncovers how efficiency, scalability, and democratization of technology are shaping the next generation of computing. Find out more about Xsight Labs by visting their webstite: https://xsightlabs.com/ Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra

2025-09-28 · Data Engineering Podcast Listen

podcast_episode

by Brijesh Tripathi (Flex AI) , Tobias Macey

AI/ML Aurora Data Engineering Datafold DevOps ETL/ELT GenAI Kubernetes Prefect Data Streaming

Summary In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting how access friction and idle infrastructure slow progress. Join them as they discuss Flex AI's innovative approach to simplifying heterogeneous compute, standardizing on consistent Kubernetes layers, and abstracting inference across various accelerators, allowing teams to iterate faster without wrestling with drivers, libraries, or cloud-by-cloud differences. Brijesh also shares insights into Flex AI's strategies for lifting utilization, protecting real-time workloads, and spanning the full lifecycle from fine-tuning to autoscaled inference, all while keeping complexity at bay.

Pre-amble I hope you enjoy this cross-over episode of the AI Engineering Podcast, another show that I run to act as your guide to the fast-moving world of building scalable and maintainable AI systems. As generative AI models have grown more powerful and are being applied to a broader range of use cases, the lines between data and AI engineering are becoming increasingly blurry. The responsibilities of data teams are being extended into the realm of context engineering, as well as designing and supporting new infrastructure elements that serve the needs of agentic applications. This episode is an example of the types of work that are not easily categorized into one or the other camp.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Brijesh Tripathi about FlexAI, a platform offering a service-oriented abstraction for AI workloadsInterview IntroductionHow did you get involved in machine learning?Can you describe what FlexAI is and the story behind it?What are some examples of the ways that infrastructure challenges contribute to friction in developing and operating AI applications?How do those challenges contribute to issues when scaling new applications/businesses that are founded on AI?There are numerous managed services and deployable operational elements for operationalizing AI systems. What are some of the main pitfalls that teams need to be aware of when determining how much of that infrastructure to own themselves?Orchestration is a key element of managing the data and model lifecycles of these applications. How does your approach of "workload as a service" help to mitigate some of the complexities in the overall maintenance of that workload?Can you describe the design and architecture of the FlexAI platform?How has the implementation evolved from when you first started working on it?For someone who is going to build on top of FlexAI, what are the primary interfaces and concepts that they need to be aware of?Can you describe the workflow of going from problem to deployment for an AI workload using FlexAI?One of the perennial challenges of making a well-integrated platform is that there are inevitably pre-existing workloads that don't map cleanly onto the assumptions of the vendor. What are the affordances and escape hatches that you have built in to allow partial/incremental adoption of your service?What are the elements of AI workloads and applications that you are explicitly not trying to solve for?What are the most interesting, innovative, or unexpected ways that you have seen FlexAI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on FlexAI?When is FlexAI the wrong choice?What do you have planned for the future of FlexAI?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Links Flex AIAurora Super ComputerCoreWeaveKubernetesCUDAROCmTensor Processing Unit (TPU)PyTorchTritonTrainiumASIC == Application Specific Integrated CircuitSOC == System On a ChipLoveableFlexAI BlueprintsTenstorrentThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Hypergrowth startups: Uber and CloudKitchens with Charles-Axel Dein

2025-09-24 · The Pragmatic Engineer Listen

podcast_episode

by Gergely Orosz , Charles-Axel Dein (Uber)

AI/ML Analytics GitHub LLM Marketing

Brought to You By: •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. Statsig built a complete set of data tools that allow engineering teams to measure the impact of their work. This toolkit is SO valuable to so many teams, that OpenAI - who was a huge user of Statsig - decided to acquire the company, the news announced last week. Talk about validation! Check out Statsig. •⁠ Linear – The system for modern product development. Here’s an interesting story: OpenAI switched to Linear as a way to establish a shared vocabulary between teams. Every project now follows the same lifecycle, uses the same labels, and moves through the same states. Try Linear for yourself. — What does it take to do well at a hyper-growth company? In this episode of The Pragmatic Engineer, I sit down with Charles-Axel Dein, one of the first engineers at Uber, who later hired me there. Since then, he’s gone on to work at CloudKitchens. He’s also been maintaining the popular Professional programming reading list GitHub repo for 15 years, where he collects articles that made him a better programmer. In our conversation, we dig into what it’s really like to work inside companies that grow rapidly in scale and headcount. Charles shares what he’s learned about personal productivity, project management, incidents, interviewing, plus how to build flexible skills that hold up in fast-moving environments. Jump to interesting parts: • 10:41 – the reality of working inside a hyperscale company • 41:10 – the traits of high-performing engineers • 1:03:31 – Charles’ advice for getting hired in today’s job market We also discuss: • How to spot the signs of hypergrowth (and when it’s slowing down) • What sets high-performing engineers apart beyond shipping • Charles’s personal productivity tips, favorite reads, and how he uses reading to uplevel his skills • Strategic tips for building your resume and interviewing • How imposter syndrome is normal, and how leaning into it helps you grow • And much more! If you’re at a fast-growing company, considering joining one, or looking to land your next role, you won’t want to miss this practical advice on hiring, interviewing, productivity, leadership, and career growth. — Timestamps (00:00) Intro (04:04) Early days at Uber as engineer #20 (08:12) CloudKitchens’ similarities with Uber (10:41) The reality of working at a hyperscale company (19:05) Tenancies and how Uber deployed new features (22:14) How CloudKitchens handles incidents (26:57) Hiring during fast-growth (34:09) Avoiding burnout (38:55) The popular Professional programming reading list repo (41:10) The traits of high-performing engineers (53:22) Project management tactics (1:03:31) How to get hired as a software engineer (1:12:26) How AI is changing hiring (1:19:26) Unexpected ways to thrive in fast-paced environments (1:20:45) Dealing with imposter syndrome (1:22:48) Book recommendations (1:27:26) The problem with survival bias (1:32:44) AI’s impact on software development (1:42:28) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: •⁠ Software engineers leading projects •⁠ The Platform and Program split at Uber •⁠ Inside Uber’s move to the Cloud •⁠ How Uber built its observability platform •⁠ From Software Engineer to AI Engineer – with Janvi Kalra — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

178 - Designing Human-Friendly AI Tech in a World Moving Too Fast with Author and Speaker Kate O’Neill

2025-09-16 · Experiencing Data w/ Brian T. O’Neill (AI & data product management leadership—powered by UX design) Listen

podcast_episode

by Brian O’Neill (Designing for Analytics) , Kate O’Neill

AI/ML SaaS

In this episode, I sat down with tech humanist Kate O’Neill to explore how organizations can balance human-centered design in a time when everyone is racing to find ways to leverage AI in their businesses. Kate introduced her “Now–Next Continuum,” a framework that distinguishes digital transformation (catching up) from true innovation (looking ahead). We dug into real-world challenges and tensions of moving fast vs. creating impact with AI, how ethics fits into decision making, and the role of data in making informed decisions.

Kate stressed the importance of organizations having clear purpose statements and values from the outset, proxy metrics she uses to gauge human-friendliness, and applying a “harms of action vs. harms of inaction” lens for ethical decisions. Her key point: human-centered approaches to AI and technology creation aren’t slow; they create intentional structures that speed up smart choices while avoiding costly missteps.

Highlights/ Skip to:

How Kate approaches discussions with executives about moving fast, but also moving in a human-centered way when building out AI solutions (1:03) Exploring the lack of technical backgrounds among many CEOs and how this shapes the way organizations make big decisions around technical solutions (3:58) FOMO and the “Solution in Search of a Problem” problem in Data (5:18) Why ongoing ethnographic research and direct exposure to users are essential for true innovation (11:21) Balancing organizational purpose and human-centered tech decisions, and why a defined purpose must precede these decisions (18:09) How organizations can define, measure, operationalize, and act on ethical considerations in AI and data products (35:57) Risk management vs. strategic optimism: balancing risk reduction with embracing the art of the possible when building AI solutions (43:54)

Quotes from Today’s Episode "I think the ethics and the governance and all those kinds of discussions [about the implications of digital transformation] are all very big word - kind of jargon-y kinds of discussions - that are easy to think aren't important, but what they all tend to come down to is that alignment between what the business is trying to do and what the person on the other side of the business is trying to do." –Kate O’Neill

" I've often heard the term digital transformation used almost interchangeably with the term innovation. And I think that that's a grave disservice that we do to those two concepts because they're very different. Digital transformation, to me, seems as if it sits much more comfortably on the earlier side of the Now-Next Continuum. So, it's about moving the past to the present… Innovation is about standing in the present and looking to the future and thinking about the art of the possible, like you said. What could we do? What could we extract from this unstructured data (this mess of stuff that’s something new and different) that could actually move us into green space, into territory that no one’s doing yet? And those are two very different sets of questions. And in most organizations, they need to be happening simultaneously." –Kate O’Neill

"The reason I chose human-friendly [as a term] over human-centered partly because I wanted to be very honest about the goal and not fall back into, you know, jargony kinds of language that, you know, you and I and the folks listening probably all understand in a certain way, but the CEOs and the folks that I'm necessarily trying to get reading this book and make their decisions in a different way based on it." –Kate O’Neill

“We love coming up with new names for different things. Like whether something is “cloud,” or whether it’s like, you know, “SaaS,” or all these different terms that we’ve come up with over the years… After spending so long working in tech, it is kind of fun to laugh at it. But it’s nice that there’s a real earnestness [to it]. That’s sort of evergreen [laugh]. People are always trying to genuinely solve human problems, which is what I try to tap into these days, with the work that I do, is really trying to help businesses—business leaders, mostly, but a lot of those are non-tech leaders, and I think that’s where this really sticks is that you get a lot of people who have ascended into CEO or other C-suite roles who don’t come from a technology background.”

–Kate O’Neill

"My feeling is that if you're not regularly doing ethnographic research and having a lot of exposure time directly to customers, you’re doomed. The people—the makers—have to be exposed to the users and stakeholders. There has to be ongoing work in this space; it can't just be about defining project requirements and then disappearing. However, I don't see a lot of data teams and AI teams that have non-technical research going on where they're regularly spending time with end users or customers such that they could even imagine what the art of the possible could be.”

–Brian T. O’Neill

Links

KO Insights: https://www.koinsights.com/ LinkedIn for Kate O’Neill: https://www.linkedin.com/in/kateoneill/ Kate O’Neill Book: What Matters Next: A Leader's Guide to Making Human-Friendly Tech Decisions in a World That's Moving Too Fast

Berlin Buzzwords 2025 Conference Interviews

2025-09-12 · DataTalks.Club Listen

podcast_episode

by Kacper Łukawski (Qdrant) , Filip Makraduli (Superlinked) , Brian Goldin (Voyager Search) , André Charton (Kleinanzeigen) , Manish Gill (ClickHouse) , Atita Arora

AI/ML ClickHouse ELK Kubernetes LLM NLP RAG

At Berlin Buzzwords, industry voices highlighted how search is evolving with AI and LLMs.

Kacper Łukawski (Qdrant) stressed hybrid search (semantic + keyword) as core for RAG systems and promoted efficient embedding models for smaller-scale use.
Manish Gill (ClickHouse) discussed auto-scaling OLAP databases on Kubernetes, combining infrastructure and database knowledge.
André Charton (Kleinanzeigen) reflected on scaling search for millions of classifieds, moving from Solr/Elasticsearch toward vector search, while returning to a hands-on technical role.
Filip Makraduli (Superlinked) introduced a vector-first framework that fuses multiple encoders into one representation for nuanced e-commerce and recommendation search.
Brian Goldin (Voyager Search) emphasized spatial context in retrieval, combining geospatial data with AI enrichment to add the “where” to search.
Atita Arora (Voyager Search) highlighted geospatial AI models, the renewed importance of retrieval in RAG, and the cautious but promising rise of AI agents.

Together, their perspectives show a common thread: search is regaining center stage in AI—scaling, hybridization, multimodality, and domain-specific enrichment are shaping the next generation of retrieval systems.

Kacper Łukawski Senior Developer Advocate at Qdrant, he educates users on vector and hybrid search. He highlighted Qdrant’s support for dense and sparse vectors, the role of search with LLMs, and his interest in cost-effective models like static embeddings for smaller companies and edge apps. Connect: https://www.linkedin.com/in/kacperlukawski/

Manish Gill
Engineering Manager at ClickHouse, he spoke about running ClickHouse on Kubernetes, tackling auto-scaling and stateful sets. His team focuses on making ClickHouse scale automatically in the cloud. He credited its speed to careful engineering and reflected on the shift from IC to manager.
Connect: https://www.linkedin.com/in/manishgill/

André Charton
Head of Search at Kleinanzeigen, he discussed shaping the company’s search tech—moving from Solr to Elasticsearch and now vector search with Vespa. Kleinanzeigen handles 60M items, 1M new listings daily, and 50k requests/sec. André explained his career shift back to hands-on engineering.
Connect: https://www.linkedin.com/in/andrecharton/

Filip Makraduli
Founding ML DevRel engineer at Superlinked, an open-source framework for AI search and recommendations. Its vector-first approach fuses multiple encoders (text, images, structured fields) into composite vectors for single-shot retrieval. His Berlin Buzzwords demo showed e-commerce search with natural-language queries and filters.
Connect: https://www.linkedin.com/in/filipmakraduli/

Brian Goldin
Founder and CEO of Voyager Search, which began with geospatial search and expanded into documents and metadata enrichment. Voyager indexes spatial data and enriches pipelines with NLP, OCR, and AI models to detect entities like oil spills or windmills. He stressed adding spatial context (“the where”) as critical for search and highlighted Voyager’s 12 years of enterprise experience.
Connect: https://www.linkedin.com/in/brian-goldin-04170a1/

Atita Arora
Director of AI at Voyager Search, with nearly 20 years in retrieval systems, now focused on geospatial AI for Earth observation data. At Berlin Buzzwords she hosted sessions, attended talks on Lucene, GPUs, and Solr, and emphasized retrieval quality in RAG systems. She is cautiously optimistic about AI agents and values the event as both learning hub and professional reunion.
Connect: https://www.linkedin.com/in/atitaarora/

talk-data.com

Cloud Computing

Activity Trend

Top Events

Top Speakers

Bright Ideas, Bold Bets: Entrepreneurship, Agentic AI, and Career Gold with Lighthouse CEO Ben Lowe. {Replay}

Entrepreneurship #AgenticAI #TechLeadership #Innovation #BusinessGrowth #CareerAdvice #AIForEnterprise #StartupJourney #MakingDataSimple #LighthouseTechnology #PodcastReplay

From Full-Time Mom to Head of Data and Cloud - Xia He-Bleinagel

Blurring Lines: Data, AI, and the New Playbook for Team Velocity

Episode 261: 🇳🇱 C++ Under the Sea 🇳🇱 Bernhard, Koen & C++26 Reflection!

How Do You Build AI That Actually Works 99.999999% of the Time? With Tina Tarquinio Chief Product Officer of IBM Z and LinuxONE

MakingDataSimple #IBMz #Mainframe #LinuxONE #AIInferencing #SpyreAccelerator #HybridCloud #EnterpriseAI #DevOps #AICodeAssistant #EightNines #TinaTarquinio #MainframeModernization #AIUChip #FutureProofing #TechLeadership #WatsonxCodeAssistant #CloudComputing #TelumII

State, Scale, and Signals: Rethinking Orchestration with Durable Execution

Great Products are Grounded in Deep Empathy with Chris Mchenry

The AI Data Paradox: High Trust in Models, Low Trust in Data

Bridging the AI–Data Gap: Collect, Curate, Serve

181 - Lessons Learned Designing Orion, Gravity’s AI, AI Analyst Product with CEO Lucas Thelosen (former Head of Product @ Google Data & AI Cloud)

#328 The Challenges of Enterprise Agentic AI with Manasi Vartak, Chief AI Architect at Cloudera

Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access

Look Back: Accelerating Access to Data w/ Harry Carr

The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies

Google’s engineering culture

Saving Watts for GPUs? Put DPUs to work with Power Efficient AI Storage! w/ Ted Weatherford

From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra

Hypergrowth startups: Uber and CloudKitchens with Charles-Axel Dein

178 - Designing Human-Friendly AI Tech in a World Moving Too Fast with Author and Speaker Kate O’Neill

Berlin Buzzwords 2025 Conference Interviews