Managing dbt for 150 analytics engineers meant evolving from fragmented dbt Core projects to unified standards, migrating to dbt Cloud. We solved security risks and inconsistent practices through standardization and centralized workflows, while maintaining our Airflow orchestration. Challenges remain in balancing governance with analyst autonomy at scale.
talk-data.com
Topic
Cloud Computing
4055
tagged
Activity Trend
Top Events
The lakehouse promised to unify our data, but popular formats can feel bloated and hard to use for most real-world workloads. If you've ever felt that the complexity and operational overhead of "Big Data" tools are overkill, you're not alone. What if your lakehouse could be simple, fast, and maybe even a little fun? Enter DuckLake , the native lakehouse format, managed on MotherDuck. It delivers the powerful features you need like ACID transactions, time travel, and schema evolution without the heavyweight baggage. This approach truly makes massive data sets feel like Small Data. This workshop is a practical, step-by-step walkthrough for the data practitioner. We'll get straight to the point and show you how to build a fully functional, serverless lakehouse from scratch. You will learn: The Architecture: We’ll explore how DuckLake's design choices make it fundamentally simpler and faster for analytical queries compared to its JVM-based cousins. The Workflow: Through hands-on examples, you'll create a DuckLake table, perform atomic updates, and use time travel—all with the simple SQL you already know. The MotherDuck Advantage: Discover how the serverless platform makes it easy to manage, share, and query your DuckLake tables, enabling a seamless hybrid workflow between your laptop and the cloud.
Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle layer" of curation, semantics, and serving. Omri and Ido outline a three-part framework for making data usable by LLMs and agents: collect, curate, serve, and share challenges of scaling from POCs to production, including compounding error rates and reliability concerns. They also explore organizational shifts, patterns for managing context windows, pragmatic views on schema choices, and Upriver's approach to building autonomous data workflows using determinism and LLMs at the right boundaries. The conversation concludes with a look ahead to AI-first data platforms where engineers supervise business semantics while automation stitches technical details end-to-end.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Omri Lifshitz and Ido Bronstein about the challenges of keeping up with the demand for data when supporting AI systemsInterview IntroductionHow did you get involved in the area of data management?We're here to talk about "The Growing Gap Between Data & AI". From your perspective, what is this gap, and why do you think it's widening so rapidly right now?How does this gap relate to the founding story of Upriver? What problems were you and your co-founders experiencing that led you to build this?The core premise of new AI tools, from RAG pipelines to LLM agents, is that they are only as good as the data they're given. How does this "garbage in, garbage out" problem change when the "in" is not a static file but a complex, high-velocity, and constantly changing data pipeline?Upriver is described as an "intelligent agent system" and an "autonomous data engineer." This is a fascinating "AI to solve for AI" approach. Can you describe this agent-based architecture and how it specifically works to bridge that data-AI gap?Your website mentions a "Data Context Layer" that turns "tribal knowledge" into a "machine-usable mode." This sounds critical for AI. How do you capture that context, and how does it make data "AI-ready" in a way that a traditional data catalog or quality tool doesn't?What are the most innovative or unexpected ways you've seen companies trying to make their data "AI-ready"? And where are the biggest points of failure you observe?What has been the most challenging or unexpected lesson you've learned while building an AI system (Upriver) that is designed to fix the data foundation for other AI systems?When is an autonomous, agent-based approach not the right solution for a team's data quality problems? What organizational or technical maturity is required to even start closing this data-AI gap?What do you have planned for the future of Upriver? And looking more broadly, how do you see this gap between data and AI evolving over the next few years?Contact Info Ido - LinkedInOmri - LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UpriverRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeAI AgentContext WindowModel Finetuning)The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
The AI landscape is evolving beyond gigantic models like GPT-4 towards a new generation of small, smart, and specialised models that can run privately, securely and efficiently on everyday devices. In this talk, Mehmood explores how these compact models, trained on domain-specific data, deliver powerful performance while reducing energy costs, improving privacy, and removing the need for constant cloud access. From customer service chatbots that understand regional dialects to intelligent on-device assistants in healthcare and retail, discover how small AI is making intelligence more sustainable, secure, and accessible for businesses of all sizes.
Breakout session focusing on VPCs for secure cross-cloud apps on Cloudflare Workers.
Join Toyota Motor Europe to discover their journey towards a fully operationalized Data Mesh with dbt and Snowflake.
TME (Toyota Motos Europe), one of biggest automobile manufacturing companies, oversees the wholesale sales and marketing of Toyota and Lexus vehicles in Europe. This session will showcase how dbt Cloud and Snowflake are supporting their data strategy.
They will elaborate on challenges faced along the way, and how their platform is supporting their future vision, e.g. enabling advanced real-time analytics, scaling while maintaining governance and best practices and setting themselves up with a strong data foundation to launch their AI/ML initiatives.
Join Martin Frederik, Maria Wiss, Grace Adamson, and Dash Desai, as they kickoff Snowflake World Tour Amsterdam, share vision for the AI Data Cloud, customer stories and make exciting product announcements.
Best Datadog configuration at scale—Volkswagen Group case study.
On today's Promoted Episode of Experiencing Data, I’m talking with Lucas Thelosen, CEO of Gravity and creator of Orion, an AI analyst transforming how data teams work. Lucas was head of PS for Looker, and eventually became Head of Product for Google’s Data and AI Cloud prior to starting his own data product company. We dig into how his team built Orion, the challenge of keeping AI accurate and trustworthy when doing analytical work, and how they’re thinking about the balance of human control with automation when their product acts as a force multiplier for human analysts.
In addition to talking about the product, we also talk about how Gravity arrived at specific enough use cases for this technology that a market would be willing to pay for, and how they’re thinking about pricing in today’s more “outcomes-based” environment.
Incidentally, one thing I didn’t know when I first agreed to consider having Gravity and Lucas on my show was that Lucas has been a long-time proponent of data product management and operating with a product mindset. In this episode, he shares the “ah-hah” moment where things clicked for him around building data products in this manner. Lucas shares how pivotal this moment was for him, and how it helped accelerate his career from Looker to Google and now Gravity.
If you’re leading a data team, you’re a forward-thinking CDO, or you’re interested in commercializing your own analytics/AI product, my chat with Lucas should inspire you!
Highlights/ Skip to:
Lucas’s breakthrough came when he embraced a data product management mindset (02:43) How Lucas thinks about Gravity as being the instrumentalists in an orchestra, conducted by the user (4:31) Finding product-market fit by solving for a common analytics pain point (8:11) Analytics product and dashboard adoption challenges: why dashboards die and thinking of analytics as changing the business gradually (22:25) What outcome-based pricing means for AI and analytics (32:08) The challenge of defining guardrails and ethics for AI-based analytics products [just in case somebody wants to “fudge the numbers”] (46:03) Lucas’ closing thoughts about what AI is unlocking for analysts and how to position your career for the future (48:35)
Special Bonus for DPLC Community Members Are you a member of the Data Product Leadership Community? After our chat, I invited Lucas to come give a talk about his journey of moving from “data” to “product” and adopting a producty mindset for analytics and AI work. He was more than happy to oblige. Watch for this in late 2025/early 2026 on our monthly webinar and group discussion calendar.
Note: today’s episode is one of my rare Promoted Episodes. Please help support the show by visiting Gravity’s links below:
Quotes from Today’s Episode “The whole point of data and analytics is to help the business evolve. When your reports make people ask new questions, that’s a win. If the conversations today sound different than they did three months ago, it means you’ve done your job, you’ve helped move the business forward.” — Lucas
“Accuracy is everything. The moment you lose trust, the business, the use case, it's all over. Earning that trust back takes a long time, so we made accuracy our number one design pillar from day one.” — Lucas
“Language models have changed the game in terms of scale. Suddenly, we’re facing all these new kinds of problems, not just in AI, but in the old-school software sense too. Things like privacy, scalability, and figuring out who’s responsible.” — Brian
“Most people building analytics products have never been analysts, and that’s a huge disadvantage. If data doesn’t drive action, you’ve missed the mark. That’s why so many dashboards die quickly.” — Lucas
“Re: collecting feedback so you know if your UX is good: I generally agree that qualitative feedback is the best place to start, not analytics [on your analytics!] Especially in UX, analytics measure usage aspects of the product, not the subject human experience. Experience is a collection of feelings and perceptions about how something went.” — Brian
Links
Gravity: https://www.bygravity.com LinkedIn: https://www.linkedin.com/in/thelosen/ Email Lucas and team: [email protected]
Dialogue focusing on chips and compute options for DACH, featuring representatives from Cerebras and SiPearl (EU landscape) discussing GPUs, cloud vs on-prem, and related considerations.
The promise of AI in enterprise settings is enormous, but so are the privacy and security challenges. How do you harness AI's capabilities while keeping sensitive data protected within your organization's boundaries? Private AI—using your own models, data, and infrastructure—offers a solution, but implementation isn't straightforward. What governance frameworks need to be in place? How do you evaluate non-deterministic AI systems? When should you build in-house versus leveraging cloud services? As data and software teams evolve in this new landscape, understanding the technical requirements and workflow changes is essential for organizations looking to maintain control over their AI destiny. Manasi Vartak is Chief AI Architect and VP of Product Management (AI Platform) at Cloudera. She is a product and AI leader with more than a decade of experience at the intersection of AI infrastructure, enterprise software, and go-to-market strategy. At Cloudera, she leads product and engineering teams building low-code and high-code generative AI platforms, driving the company’s enterprise AI strategy and enabling trusted AI adoption across global organizations. Before joining Cloudera through its acquisition of Verta, Manasi was the founder and CEO of Verta, where she transformed her MIT research into enterprise-ready ML infrastructure. She scaled the company to multi-million ARR, serving Fortune 500 clients in finance, insurance, and capital markets, and led the launch of enterprise MLOps and GenAI products used in mission-critical workloads. Manasi earned her PhD in Computer Science from MIT, where she pioneered model management systems such as ModelDB — foundational work that influenced the development of tools like MLflow. Earlier in her career, she held research and engineering roles at Twitter, Facebook, Google, and Microsoft. In the episode, Richie and Manasi explore AI's role in financial services, the challenges of AI adoption in enterprises, the importance of data governance, the evolving skills needed for AI development, the future of AI agents, and much more. Links Mentioned in the Show: ClouderaCloudera Evolve ConferenceCloudera Agent StudioConnect with ManasiCourse: Introduction to AI AgentsRelated Episode: RAG 2.0 and The New Era of RAG Agents with Douwe Kiela, CEO at Contextual AI & Adjunct Professor at Stanford UniversityRewatch RADAR AI New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business
Summary In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems, integration burdens have exploded, fracturing governance and auditability across warehouses, lakes, files, vector stores, and streaming systems. Matt shares practical solutions, including propagating user identity via JWTs, externalizing policy with engines like OPA/Rego and Cedar, and using database proxies for native row/column security. He also explores catalog-driven governance, lineage-based label propagation, and OpenTDF for binding policies to data objects. The conversation covers machine-to-machine access, short-lived credentials, workload identity, and constraining access by interface choke points, as well as lessons from Zanzibar-style policy models and the human side of enforcement. Matt emphasizes the need for trust composition - unifying provenance, policy, and identity context - to answer questions about data access, usage, and intent across the entire data path.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Matt Topper about the challenges of managing identity and access controls in the context of data systemsInterview IntroductionHow did you get involved in the area of data management?The data ecosystem is a uniquely challenging space for creating and enforcing technical controls for identity and access control. What are the key considerations for designing a strategy for addressing those challenges?For data acess the off-the-shelf options are typically on either extreme of too coarse or too granular in their capabilities. What do you see as the major factors that contribute to that situation?Data governance policies are often used as the primary means of identifying what data can be accesssed by whom, but translating that into enforceable constraints is often left as a secondary exercise. How can we as an industry make that a more manageable and sustainable practice?How can the audit trails that are generated by data systems be used to inform the technical controls for identity and access?How can the foundational technologies of our data platforms be improved to make identity and authz a more composable primitive?How does the introduction of streaming/real-time data ingest and delivery complicate the challenges of security controls?What are the most interesting, innovative, or unexpected ways that you have seen data teams address ICAM?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ICAM?What are the aspects of ICAM in data systems that you are paying close attention to?What are your predictions for the industry adoption or enforcement of those controls?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UberEtherJWT == JSON Web TokenOPA == Open Policy AgentRegoPingIdentityOktaMicrosoft EntraSAML == Security Assertion Markup LanguageOAuthOIDC == OpenID ConnectIDP == Identity ProviderKubernetesIstioAmazon CEDAR policy languageAWS IAMPII == Personally Identifiable InformationCISO == Chief Information Security OfficerOpenTDFOpenFGAGoogle ZanzibarRisk Management FrameworkModel Context ProtocolGoogle Data ProjectTPM == Trusted Platform ModulePKI == Public Key InfrastructurePassskeysDuckLakePodcast EpisodeAccumuloJDBCOpenBaoHashicorp VaultLDAPThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
As we count down to the 100th episode of Data Unchained, we’re revisiting one of the conversations that perfectly captures the spirit of this show: how data mobility is transforming business. In this look-back episode, host Molly Presley welcomes Harry Carr, CEO of Vcinity, for a deep dive into the technology that’s redefining how enterprises access and move data across distributed environments. Harry explains why hybrid cloud exists, how Vcinity accelerates data access without duplication or compression, and why the future of data architecture lies in making data available anywhere—instantly. From connecting global AI workflows to eliminating the need to move massive datasets, this episode explores what true “data anti-gravity” looks like and how it’s reshaping the modern enterprise. Listen as Molly and Harry discuss the evolution of data architectures, the synergy between Hammerspace and Vcinity, and what it means to build a world where applications and data connect seamlessly, no matter where they live. Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.
Dive into 'Microsoft Power Platform Solution Architect's Handbook' to master the art of designing and delivering enterprise-grade solutions using Microsoft's cutting-edge Power Platform. Through a mix of practical examples and hands-on tutorials, this book equips you to harness tools like AI, Copilot, and DevOps for building innovative, scalable applications tailored to enterprise needs. What this Book will help me do Acquire the knowledge to effectively utilize AI tools such as Power Platform Copilot and ChatGPT to enhance application intelligence. Understand and apply enterprise-grade solution architecture principles for scalable and secure application development. Gain expertise in integrating heterogenous systems with Power Platform Pipes and third-party APIs. Develop proficiency in creating and maintaining reusable Dataverse data models. Learn to establish and manage a Center of Excellence to govern and scale Power Platform solutions. Author(s) Hugo Herrera is an experienced solution architect specializing in the Microsoft Power Platform with a deep focus on integrating AI and cloud-native strategies. With years of hands-on experience in enterprise software development and architectural design, Hugo brings real-world insights into his writing, emphasizing practical application of advanced concepts. His approach is clear, structured, and aimed at empowering readers to excel. Who is it for? This book is tailored for IT professionals like solution architects, enterprise architects, and technical consultants who are looking to elevate their capabilities in Power Platform development. It is also suitable for individuals with an intermediate understanding of Power Platform seeking to spearhead enterprise-level digital transformation projects. Ideal readers are those ready to deepen their integration, data modeling, and AI usage skills within the Microsoft ecosystem, particularly for enterprise applications.
When we at Bol decided to personalize campaign banners, we did what many companies do: bought an expensive solution. As a software engineering team with zero data science experience, we integrated a third-party recommender system for €1 million annually, built the cloud infrastructure, and waited for results. After our first season, the data told a harsh truth—the third-party tool wasn't delivering value proportional to its cost. We faced a crossroads: accept mediocrity or build our own solution from scratch, tailored to our requirements and architecture.\n\nWe'll walk you through our journey of building a more intelligent and flexible recommendation system from the ground up, and how this journey saved us over a million euros per year. We will share the incremental steps that shaped our journey, alongside the valuable lessons learned along the way.