Data Management

How Dun & Bradstreet Leverages Data Observability for Quality & Efficiency

2025-09-24 · Big Data LDN 2025

Face To Face

by Ramon Chen (Acceldata) , Paul Fulton (Dun & Bradstreet)

AI/ML Analytics Cloud Computing

Discover how Dun & Bradstreet and other global enterprises use Data Observability to ensure Quality & Efficiency, and enforce compliance across on-prem and cloud environments. Learn proven strategies to operationalize governance, accelerate cloud migrations, and deliver trusted data for AI and analytics at scale. Join us to learn how Data Observability and Agentic Data Management empowers leaders, engineers, and business teams to drive efficiency and savings at petabyte scale.

Prizm Unleashed: The Future of Autonomous Data Management

2025-09-24 · Big Data LDN 2025

Face To Face

by Raj Joseph (DQLabs, Inc.)

AI/ML Big Data Data Quality

In an era where data complexity and scale challenge every organization, manual intervention can no longer keep pace. Prizm by DQLabs redefines the paradigm—offering a no-touch, agentic data platform that seamlessly integrates Data Quality, Observability, and Semantic Intelligence into one self-learning, self-optimizing ecosystem.

Unlike legacy systems Prizm is AI native, it is Agentic by Design, built from the ground up around a network of intelligent, role-driven agents that observe, recommend, act, and learn in concert to deliver continuous, autonomous data trust.

Join us at Big Data London to Discover how Prizm’s agent-driven anomaly detection, data quality enforcement, and deep semantic analysis set a new industry standard—shifting data and AI trust from an operational burden to a competitive advantage that powers actionable, insight-driven outcomes.

The Data Product Marketplace: turning potential value into tangible outcomes

2025-09-24 · Big Data LDN 2025

Face To Face

by Andrea Gioia (Quantyca) , Roberto Grandi (ENI)

Federated data management approaches like data mesh promise to reduce complexity by organizing data into domain-owned, reusable products. But managing data as a product alone isn't enough. In many organizations, true reuse and cross-domain collaboration remain limited, while redundant data products continue to grow driving up costs without delivering efficiency. To make federated data strategies work, organizations also need a platform where supply and demand can meet, and where valuable products can be easily discovered, understood, accessed, and combined. They need a data product marketplace. In this talk, using real-world examples, we will explore how a data product marketplace: Drives reuse and composability of data products, reducing integration costs and helping stabilize maintenance over time. Aligns data supply with real business demand, highlighting high-value products and preventing the unchecked growth of low-impact ones. Engages the full ecosystem, from producers to consumers, in shaping governance policies and a shared language that support collaboration and trust. A well-designed data product marketplace is not just a nice to have. It is the necessary link that makes federated data management strategies both sustainable and effective.

From pipelines automation to trusted agents: THE PATH TO HIGH DATA ROI

2025-09-24 · Big Data LDN 2025

Face To Face

by Taylor McGrath (Boomi)

AI/ML API

In the age of agentic AI, competitive advantage lies not only in AI models, but in the quality of the data agents reason on and the agility of the tools that feed them. To fully realize the ROI of agentic AI, organizations need a platform that enables high-quality data pipelines and provides scalable, enterprise-grade tools. In this session, discover how a unified platform for integration, data management, MCP server management, API management, and agent orchestration can help you to bring cohesion and control to how data and agents are used across your organization.

Welcome to Big Data LDN 2025

2025-09-24 · Big Data LDN 2025

Face To Face

by Mike Ferguson (Big Data LDN)

AI/ML Analytics Big Data

In this short presentation, Big Data LDN Conference Chairman and Europe’s leading IT Industry Analyst in Data Management and Analytics, Mike Ferguson, will welcome everyone to Big Data LDN 2025. He will also summarise where companies are in data, analytics and AI in 2025, what the key challenges and trends are, how are these trends impacting on how companies build a data-driven enterprise and where you can find out more about these at the show.

When Jason interviewed ChatGPT: Leadership in the age of AI

2025-09-18 · Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy Listen

podcast_episode

by ChatGPT (OpenAI) , Jason Foster (Cynozure)

AI/ML Analytics LLM

What if the future of leadership wasn't explained by another CEO, but by an AI? In this special episode of Hub & Spoken, hosted by Jason Foster, CEO & Founder of Cynozure, the guest isn't a data or business leader. It's ChatGPT. Together, they explore one of the most pressing questions for organisations today: What does leadership mean in the age of artificial intelligence? The discussion contrasts the logical view of leadership, vision, decision-making and orchestration, with the uniquely human qualities that machines can't replicate: courage under pressure, conviction, vulnerability, and trust. The result is a fascinating tension. AI can support with logic, speed, and analysis. But leadership is still defined by what makes us human. 🎧 Tune in for this experiment in leadership dialogue. ****  Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation.

From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture

2025-09-18 · Data Engineering Podcast Listen

podcast_episode

by Mark Brooker (AWS) , Tobias Macey

AI/ML AWS Aurora AWS Lambda Data Engineering Datafold ELK ETL/ELT LLM Prefect Python RAG +3 more

Summary In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Mark explains why agents require serverless, elastic, and operationally simple databases, and how AWS solutions like Aurora and DSQL address these needs with features such as rapid provisioning, automated patching, geodistribution, and spiky usage. The conversation covers topics including tool calling, improved model capabilities, state in agents versus stateless LLM calls, and the role of Lambda and AgentCore for long-running, session-isolated agents. Mark also touches on the shift from local MCP tools to secure, remote endpoints, the rise of object storage as a durable backplane, and the need for better identity and authorization models. The episode highlights real-world patterns like agent-driven SQL fuzzing and plan analysis, while identifying gaps in simplifying data access, hardening ops for autonomous systems, and evolving serverless database ergonomics to keep pace with agentic development.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Marc Brooker about the impact of agentic workflows on database usage patterns and how they change the architectural requirements for databasesInterview IntroductionHow did you get involved in the area of data management?Can you describe what the role of the database is in agentic workflows?There are numerous types of databases, with relational being the most prevalent. How does the type and purpose of an agent inform the type of database that should be used?Anecdotally I have heard about how agentic workloads have become the predominant "customers" of services like Neon and Fly.io. How would you characterize the different patterns of scale for agentic AI applications? (e.g. proliferation of agents, monolithic agents, multi-agent, etc.)What are some of the most significant impacts on workload and access patterns for data storage and retrieval that agents introduce?What are the categorical differences in that behavior as compared to programmatic/automated systems?You have spent a substantial amount of time on Lambda at AWS. Given that LLMs are effectively stateless, how does the added ephemerality of serverless functions impact design and performance considerations around having to "re-hydrate" context when interacting with agents?What are the most interesting, innovative, or unexpected ways that you have seen serverless and database systems used for agentic workloads?What are the most interesting, unexpected, or challenging lessons that you have learned while working on technologies that are supporting agentic applications?Contact Info BlogLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AWS Aurora DSQLAWS LambdaThree Tier ArchitectureVector DatabaseGraph DatabaseRelational DatabaseVector EmbeddingRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeGraphRAGAI Engineering Podcast EpisodeLLM Tool CallingMCP == Model Context ProtocolA2A == Agent 2 Agent ProtocolAWS Bedrock AgentCoreStrandsLangChainKiroThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Apache Polaris: The Definitive Guide

2025-09-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by alex merced (Dremio) , Andrew Madson , Tomer Shiran (Dremio)

Data Lakehouse Dremio Iceberg Snowflake Spark apache-iceberg data data-engineering data-lake storage-repositories

Revolutionize your understanding of modern data management with Apache Polaris (incubating), the open source catalog designed for data lakehouse industry standard Apache Iceberg. This comprehensive guide takes you on a journey through the intricacies of Apache Iceberg data lakehouses, highlighting the pivotal role of Iceberg catalogs. Authors Alex Merced, Andrew Madson, and Tomer Shiran explore Apache Polaris's architecture and features in detail, equipping you with the knowledge needed to leverage its full potential. Data engineers, data architects, data scientists, and data analysts will learn how to seamlessly integrate Apache Polaris with popular data tools like Apache Spark, Snowflake, and Dremio to enhance data management capabilities, optimize workflows, and secure datasets. Get a comprehensive introduction to Iceberg data lakehouses Understand how catalogs facilitate efficient data management and querying in Iceberg Explore Apache Polaris's unique architecture and its powerful features Deploy Apache Polaris locally, and deploy managed Apache Polaris from Snowflake and Dremio Perform basic table operations on Apache Spark, Snowflake, and Dremio

A journey towards a better data strategy

2025-09-11 · Data Expo NL 2025

talk

by Matyas Miskolczi (OTP Bank) , Marton Horvath (Nextwit)

AI/ML Spark

Discover how data management maturity assessments can spark group-wide excellence in data-driven decision making. We will explore the interview-based approach of OTP Group supported by DAMA Hungary, share how its results can be turned into practical value, and give a glimpse into OTP Group’s journey. You’ll also get a teaser of a potential AI-powered agent designed to make assessments smarter and faster, and a look at our vision for future innovation.

Duck Lake: Simplifying the Lakehouse Ecosystem

2025-09-10 · Data Engineering Podcast Listen

podcast_episode

by Mark Raasveldt (DuckDB) , Hannes Mühleisen (DuckDB Labs) , Tobias Macey

AI/ML Flink Data Engineering Data Lakehouse Datafold Delta DuckDB ETL/ELT Iceberg Lance Motherduck Prefect +4 more

Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a new entrant in the open lakehouse ecosystem. They discuss how Duck Lake, is focused on simplicity, flexibility, and offers a unified catalog and table format compared to other lakehouse formats like Iceberg and Delta. Hannes and Mark share insights into how Duck Lake revolutionizes data architecture by enabling local-first data processing, simplifying deployment of lakehouse solutions, and offering benefits such as encryption features, data inlining, and integration with existing ecosystems.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Hannes Mühleisen and Mark Raasveldt about DuckLake, the latest entrant into the open lakehouse ecosystemInterview IntroductionHow did you get involved in the area of data management?Can you describe what DuckLake is and the story behind it?What are the particular problems that DuckLake is solving for?How does this compare to the capabilities of MotherDuck?Iceberg and Delta already have a well established ecosystem, but so does DuckDB. Who are the primary personas that you are trying to focus on in these early days of DuckLake?One of the major factors driving the adoption of formats like Iceberg is cost efficiency for large volumes of data. That brings with it challenges of large batch processing of data. How does DuckLake account for these axes of scale?There is also a substantial investment in the ecosystem of technologies that support Iceberg. The most notable ecosystem challenge for DuckDB and DuckLake is in the query layer. How are you thinking about the evolution and growth of that capability beyond DuckDB (e.g. support in Trino/Spark/Flink)?What are your opinions on the viability of a future where DuckLake and Iceberg become a unified standard and implementation? (why can't Iceberg REST catalog implementations just use DuckLake under the hood?)Digging into the specifics of the specification and implementation, what are some of the capabilities that it offers above and beyond Iceberg?Is it now possible to enforce PK/FK constraints, indexing on underlying data?Given that DuckDB has a vector type, how do you think about the support for vector storage/indexing?How do the capabilities of DuckLake and the integration with DuckDB change the ways that data teams design their data architecture and access patterns?What are your thoughts on the impact of "data gravity" in today's data ecosystem, with engines like DuckDB, KuzuDB, LanceDB, etc. available for embedded and edge use cases?What are the most interesting, innovative, or unexpected ways that you have seen DuckLake used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on DuckLake?When is DuckLake the wrong choice?What do you have planned for the future of DuckLake?Contact Info HannesWebsiteMarkWebsiteParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links DuckDBPodcast EpisodeDuckLakeDuckDB LabsMySQLCWIMonetDBIcebergIceberg REST CatalogDeltaHudiLanceDuckDB Iceberg ConnectorACID == Atomicity, Consistency, Isolation, DurabilityMotherDuckMotherDuck Managed DuckLakeTrinoSparkPrestoSpark DuckLake DemoDelta KernelArrowdltS3 TablesAttribute Based Access Control (ABAC)ParquetArrow FlightHadoopHDFSDuckLake RoadmapThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Lessons Every Leader Can Use to Unlock Data and AI Success

2025-09-04 · Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy Listen

podcast_episode

by James Lupton (Cynozure) , Jason Foster (Cynozure)

AI/ML Analytics

In this episode of Hub & Spoken, host Jason Foster, CEO & Founder of Cynozure, is joined by James Lupton, Chief Technology Officer at Cynozure, to explore the findings from What Matters Most for Insurers Now — a new report shaped by insights from 35 senior leaders across the insurance industry. While the report focuses on insurers, its lessons resonate far more widely. Jason and James discuss how organisations across sectors are wrestling with the same issues: outdated and overly customised legacy systems that hold back innovation, a persistent gap between the ambition to build a data-driven culture and the actions taken to achieve it, and the importance of leadership support that goes beyond lip service to meaningful investment and behaviour change. They also consider the next frontier: AI agents. With many firms experimenting but few ready to deploy, Jason and James unpack what true readiness looks like and why success requires more than just technology. This episode offers practical reflections for leaders in complex, regulated industries who are striving to "fix forward" and unlock the real value of data and AI. Download What Matters Most for Insurers Now here *****  Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation.

#319 Building & Managing Human+Agent Hybrid Teams with Karen Ng, Head of Product at HubSpot

2025-09-03 · DataFramed Listen

podcast_episode

by Karen Ng (HubSpot) , Richie (DataCamp)

AI/ML CRM Data Quality Data Science GenAI Hubspot Marketing Microsoft

The line between human work and AI capabilities is blurring in today's business environment. AI agents are now handling autonomous tasks across customer support, data management, and sales prospecting with increasing sophistication. But how do you effectively integrate these agents into your existing workflows? What's the right approach to training and evaluating AI team members? With data quality being the foundation of successful AI implementation, how can you ensure your systems have the unified context they need while maintaining proper governance and privacy controls? Karen Ng is the Head of Product at HubSpot, where she leads product strategy, design, and partnerships with the mission of helping millions of organizations grow better. Since joining in 2022, she has driven innovation across Smart CRM, Operations Hub, Breeze Intelligence, and the developer ecosystem, with a focus on unifying structured and unstructured data to make AI truly useful for businesses. Known for leading with clarity and “AI speed,” she pushes HubSpot to stay ahead of disruption and empower customers to thrive. Previously, Karen held senior product leadership roles at Common Room, Google, and Microsoft. At Common Room, she built the product and data science teams from the ground up, while at Google she directed Android’s product frameworks like Jetpack and Jetpack Compose. During more than a decade at Microsoft, she helped shape the company’s .NET strategy and launched the Roslyn compiler platform. Recognized as a Product 50 Winner and recipient of the PM Award for Technical Strategist, she also advises and invests in high-growth technology companies. In the episode, Richie and Karen explore the evolving role of AI agents in sales, marketing, and support, the distinction between chatbots, co-pilots, and autonomous agents, the importance of data quality and context, the concept of hybrid teams, the future of AI-driven business processes, and much more. Links Mentioned in the Show: Hubspot Breeze AgentsConnect with KarenWebinar: Pricing & Monetizing Your AI Products with Sam Lee, VP of Pricing Strategy & Product Operations at HubSpotRelated Episode: Enterprise AI Agents with Jun Qian, VP of Generative AI Services at OracleRewatch RADAR AI New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Data Modeling with Snowflake - Second Edition

2025-09-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Serge Gershkovich (SQL DBM)

Cloud Computing Data Modelling Data Vault Snowflake data data-engineering

Data Modeling with Snowflake provides a clear and practical guide to mastering data modeling tailored to the Snowflake Data Cloud. By integrating foundational principles of database modeling with Snowflake's unique features and functionality, this book empowers you to create scalable, cost-effective, and high-performing data solutions. What this Book will help me do Apply universal data modeling concepts within the Snowflake platform effectively. Leverage Snowflake's features such as Time Travel and Zero-Copy Cloning for optimized data solutions. Understand and utilize advanced techniques like Data Vault and Data Mesh for scalable data architecture. Master handling semi-structured data in Snowflake using practical recipes and examples. Achieve cost efficiency and resource optimization by aligning modeling principles with Snowflake's architecture. Author(s) Serge Gershkovich is an accomplished data engineer and seasoned professional in data architecture and modeling. With a passion for simplifying complex concepts, Serge's work leverages his years of hands-on experience to guide readers in mastering both foundational and advanced data management practices. His clear and practical approach ensures accessibility for all levels. Who is it for? This book is ideal for data developers and engineers seeking practical modeling guidance within Snowflake. It's suitable for data analysts looking to broaden their database design expertise, and for database beginners aiming to get a head start in structuring data. Professionals new to Snowflake will also find its clear explanations of key features aligned with modeling techniques invaluable.

Aligning Business and Data: The Essential Role of Data Modeling

2025-09-01 · Data Engineering Podcast Listen

podcast_episode

by Serge Gershkovich (SQL DBM) , Tobias Macey

AI/ML Analytics Cloud Computing Data Engineering Data Lakehouse Data Modelling Data Vault Databricks Datafold DWH LLM Snowflake +1 more

Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as a collaborative process between business stakeholders and data teams. He debunks common misconceptions that data modeling is optional or secondary, emphasizing its crucial role in ensuring alignment between business requirements and data structures. The conversation covers challenges in complex environments, the impact of technical decisions on data strategy, and the evolving role of AI in data management. Serge stresses the need for business stakeholders' involvement in data initiatives and a systematic approach to data modeling, warning against relying solely on technical expertise without considering business alignment.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Enterprises today face an enormous challenge: they’re investing billions into Snowflake and Databricks, but without strong foundations, those investments risk becoming fragmented, expensive, and hard to govern. And that’s especially evident in large, complex enterprise data environments. That’s why companies like DirecTV and Pfizer rely on SqlDBM. Data modeling may be one of the most traditional practices in IT, but it remains the backbone of enterprise data strategy. In today’s cloud era, that backbone needs a modern approach built natively for the cloud, with direct connections to the very platforms driving your business forward. Without strong modeling, data management becomes chaotic, analytics lose trust, and AI initiatives fail to scale. SqlDBM ensures enterprises don’t just move to the cloud—they maximize their ROI by creating governed, scalable, and business-aligned data environments. If global enterprises are using SqlDBM to tackle the biggest challenges in data management, analytics, and AI, isn’t it worth exploring what it can do for yours? Visit dataengineeringpodcast.com/sqldbm to learn more.Your host is Tobias Macey and today I'm interviewing Serge Gershkovich about how and why data modeling is a sociotechnical endeavorInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the activities that you think of when someone says the term "data modeling"?What are the main groupings of incomplete or inaccurate definitions that you typically encounter in conversation on the topic?How do those conceptions of the problem lead to challenges and bottlenecks in execution?Data modeling is often associated with data warehouse design, but it also extends to source systems and unstructured/semi-structured assets. How does the inclusion of other data localities help in the overall success of a data/domain modeling effort?Another aspect of data modeling that often consumes a substantial amount of debate is which pattern to adhere to (star/snowflake, data vault, one big table, anchor modeling, etc.). What are some of the ways that you have found effective to remove that as a stumbling block when first developing an organizational domain representation?While the overall purpose of data modeling is to provide a digital representation of the business processes, there are inevitable technical decisions to be made. What are the most significant ways that the underlying technical systems can help or hinder the goals of building a digital twin of the business?What impact (positive and negative) are you seeing from the introduction of LLMs into the workflow of data modeling?How does tool use (e.g. MCP connection to warehouse/lakehouse) help when developing the transformation logic for achieving a given domain representation? What are the most interesting, innovative, or unexpected ways that you have seen organizations address the data modeling lifecycle?What are the most interesting, unexpected, or challenging lessons that you have learned while working with organizations implementing a data modeling effort?What are the overall trends in the ecosystem that you are monitoring related to data modeling practices?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links sqlDBMSAPJoe ReisERD == Entity Relation DiagramMaster Data ManagementdbtData ContractsData Modeling With Snowflake book by Serge (affiliate link)Type 2 DimensionData VaultStar SchemaAnchor ModelingRalph KimballBill InmonSixth Normal FormMCP == Model Context ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

From Academia to Industry: Bridging Data Engineering Challenges

2025-08-26 · Data Engineering Podcast Listen

podcast_episode

by Paul Groth (University of Amsterdam) , Tobias Macey

AI/ML Data Engineering Datafold LLM Python

Summary In this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and lineage, as well as the challenges of data integration. He explores the impact of large language models (LLMs) on data engineering, highlighting their potential to simplify knowledge graph construction and enhance data integration. The conversation covers the evolving landscape of data architectures, managing semantics and access control, and the interplay between industry and academia in advancing data engineering practices, with Paul also sharing insights into his work with the intelligent data engineering lab and the importance of human-AI collaboration in data engineering pipelines.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Paul Groth about his research on knowledge graphs and data engineeringInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the focus and scope of your academic efforts?Given your focus on data management for machine learning as part of the INDELab, what are some of the developing trends that practitioners should be aware of?ML architectures / systems changing (matteo interlandi) GPUs for data mangementYou have spent a large portion of your career working with knowledge graphs, which have largely been a niche area until recently. What are some of the notable changes in the knowledge graph ecosystem that have resulted from the introduction of LLMs?What are some of the other ways that you are seeing LLMs change the methods of data engineering?There are numerous vague and anecdotal references to the power of LLMs to unlock value from unstructured data. What are some of the realitites that you are seeing in your research?A majority of the conversations in this podcast are focused on data engineering in the context of a business organization. What are some of the ways that management of research data is disjoint from the methods and constraints that are present in business contexts?What are the most interesting, innovative, or unexpected ways that you have seen LLM used in data management?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data engineering research?What do you have planned for the future of your research in the context of data engineering, knowledge graphs, and AI?Contact Info WebsiteemailParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links INDELabData ProvenanceElsevierSIGMOD 2025Digital TwinKnowledge GraphWikiDataKuzuDBPodcast Episodedata.worldPodcast EpisodeGraphRAGSPARQLSemantic WebGQL == Graph Query LanguageCypherAmazon NeptuneRDF == Resource Description FrameworkSwellDBFlockMTLDuckDBPodcast EpisodeMatteo InterlandiPaolo PapottiNeuromorphic ComputingPoint CloudsLongform.aiBASIL DBThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

How great leadership teams create sustainable business growth

2025-08-21 · Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy Listen

podcast_episode

by David Germain , Jason Foster (Cynozure)

AI/ML Analytics

In this episode of Hub & Spoken, Jason Foster, CEO & Founder of Cynozure, speaks with David Germain, portfolio Non-Executive Director and former senior technology and transformation leader in banking, financial services and insurance. Drawing on 30 years of global experience, David shares how sustainable business growth depends on more than just strategy and technology - it's rooted in inclusive leadership, organisational culture, and curiosity at every level. They explore why leadership teams must reflect their customer base, how to create psychological safety to encourage innovation, and why "constructive disruption" is essential for long-term success. David discusses the challenge of balancing today's operational pressures with the future ambitions of an organisation, and why trust, diversity of thought, and resilience are non-negotiables. The conversation also examines the role of technology, particularly AI, as both an enabler and a disruptor, and why leaders must prepare their people for the cultural and operational shifts it brings. If you're a business leader seeking practical ways to align people, culture, and technology for lasting impact, this episode offers clear, real-world perspectives. —— Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation.

High Performance And Low Overhead Graphs With KuzuDB

2025-08-18 · Data Engineering Podcast Listen

podcast_episode

by Prashanth Rao (KuzuDB) , Tobias Macey

AI/ML Data Engineering Datafold Iceberg Parquet

Summary In this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. He discusses the usability and scalability of KuzuDB, emphasizing its open-source nature and potential for various graph applications. The conversation explores the growing interest in graph databases due to their AI and data engineering applications, and Prashanth highlights KuzuDB's potential in edge computing, ephemeral workloads, and integration with other formats like Iceberg and Parquet.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Prashanth Rao about KuzuDB, an embeddable graph databaseInterview IntroductionHow did you get involved in the area of data management?Can you describe what KuzuDB is and the story behind it?What are the core use cases that Kuzu is focused on addressing?What is explicitly out of scope?Graph engines have been available and in use for a long time, but generally for more niche use cases. How would you characterize the current state of the graph data ecosystem?You note scalability as a feature of Kuzu, which is a phrase with many potential interpretations. Typically horizontal scaling of graphs has been complicated, in what sense does Kuzu make that claim?Can you describe some of the typical architecture and integration patterns of Kuzu?What are some of the more interesting or esoteric means of architecting with Kuzu?For cases where Kuzu is rendering a graph across an external data repository (e.g. Iceberg, etc.), what are the patterns for balancing data freshness with network/compute efficiency? (e.g. read and create every time or persist the Kuzu state)Can you describe the internal architecture of Kuzu and key design factors?What are the benefits and tradeoffs of using a columnar store with adjacency lists vs. a more graph-native storage format?What are the most interesting, innovative, or unexpected ways that you have seen Kuzu used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Kuzu?When is Kuzu the wrong choice?What do you have planned for the future of Kuzu?Contact Info WebsiteLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links KuzuDBBERTTransformer ArchitectureDuckDBPodcast EpisodeMonetDBUmbra DBsqliteCypher Query LanguageProperty GraphNeo4JGraphRAGContext EngineeringWrite-Ahead LogBauplanIcebergDuckLakeLanceLanceDBArrowPolarsArrow DataFusionGQLClickHouseAdjacency ListWhy Graph Databases Need New Join AlgorithmsKuzuDB WASMRAG == Retrieval Augmented GenerationNetworkXThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

#316 Enterprise AI Agents with Jun Qian, VP of Generative AI Services at Oracle

2025-08-18 · DataFramed Listen

podcast_episode

by Jun Qian (Oracle) , Richie (DataCamp)

AI/ML AWS GenAI LLM Microsoft Oracle Cyber Security

Combining LLMs with enterprise knowledge bases is creating powerful new agents that can transform business operations. These systems are dramatically improving on traditional chatbots by understanding context, following conversations naturally, and accessing up-to-date information. But how do you effectively manage the knowledge that powers these agents? What governance structures need to be in place before deployment? And as we look toward a future with physical AI and robotics, what fundamental computing challenges must we solve to ensure these technologies enhance rather than complicate our lives? Jun Qian is an accomplished technology leader with extensive experience in artificial intelligence and machine learning. Currently serving as Vice President of Generative AI Services at Oracle since May 2020, Jun founded and leads the Engineering and Science group, focusing on the creation and enhancement of Generative AI services and AI Agents. Previously held roles include Vice President of AI Science and Development at Oracle, Head of AI and Machine Learning at Sift, and Principal Group Engineering Manager at Microsoft, where Jun co-founded Microsoft Power Virtual Agents. Jun's career also includes significant contributions as the Founding Manager of Amazon Machine Learning at AWS and as a Principal Investigator at Verizon. In the episode, Richie and Jun explore the evolution of AI agents, the unique features of ChatGPT, the challenges and advancements in chatbot technology, the importance of data management and security in AI, and the future of AI in computing and robotics, and much more. Links Mentioned in the Show: OracleConnect with JunCourse: Introduction to AI AgentsJun at DataCamp RADARRelated Episode: A Framework for GenAI App and Agent Development with Jerry Liu, CEO at LlamaIndexRewatch RADAR AI New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

How to Rebuild Data Trust? Mindful Data Strategy and Maintenance vs Innovation - Lior Barak

2025-08-15 · DataTalks.Club Listen

podcast_episode

by Lior Barak , Alexey Grigorov

AI/ML Dashboard GitHub HTML

Struggling with data trust issues, dashboard drama, or constant pipeline firefighting? In this deep‑dive interview, Lior Barak shows you how to shift from a reactive “fix‑it” culture to a mindful, impact‑driven practice rooted in Zen/Wabi‑Sabi principles. You’ll learn: Why 97 % of CEOs say they use data, but only 24 % call themselves data‑driven The traffic‑light dashboard pattern (green / yellow / red) that instantly tells execs whether numbers are safe to use A practical rule for balancing maintenance, rollout, and innovation—and avoiding team burnout How to quantify ROI on data products, kill failing legacy systems, and handle ad‑hoc exec requests without derailing roadmaps Turning “imperfect” data into business value with mindful communication, root‑cause logs, and automated incident review loops

🕒 TIMECODES 00:00 Community and mindful data strategy 04:06 Career journey and product management insights 08:03 Wabi-sabi data and the trust crisis 11:47 AI, data imperfection, and trust challenges 20:05 Trust crisis examples and root cause analysis 25:06 Regaining trust through mindful data management 30:47 Traffic light system and effective communication 37:41 Communication gaps and team workload balance 39:58 Maintenance stress and embracing Zen mindset 49:29 Accepting imperfection and measuring impact 56:19 Legacy systems and managing executive requests 01:00:23 Role guidance and closing reflections

🔗 Connect with Lior LinkedIn - https://www.linkedin.com/in/liorbarak Website - https://cookingdata.substack.com/ Cooking Data newsletter: https://cookingdata.substack.com/ Product product lifecycle manager: https://app--data-product-lifecycle-manager-c81b10bb.base44.app/

🔗 Connect with DataTalks.Club Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/u/0/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ Check other upcoming events - https://lu.ma/dtc-events GitHub: https://github.com/DataTalksClub LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://x.com/DataTalksClub Website - https://datatalks.club/

🔗 Connect with Alexey Twitter - https://x.com/Al_Grigor Linkedin - https://www.linkedin.com/in/agrigorev/

Bridging Data and Decision-Making: AI's Role in Modern Analytics

2025-08-12 · Data Engineering Podcast Listen

podcast_episode

by Drew Gilson (Gravity) , Lucas Thelosen (Gravity) , Tobias Macey

AI/ML Analytics Data Analytics Data Engineering Datafold LLM Python SQL

Summary In this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomous data analyst that bridges the gap between data availability and business decision-making. Lucas and Drew share their backgrounds in data analytics and how their experiences have shaped their approach to leveraging AI for data analysis, emphasizing the potential of AI to democratize data insights and make sophisticated analysis accessible to companies of all sizes. They discuss the technical aspects of Orion, a multi-agent system designed to automate data analysis and provide actionable insights, highlighting the importance of integrating AI into existing workflows with accuracy and trustworthiness in mind. The conversation also explores how AI can free data analysts from routine tasks, enabling them to focus on strategic decision-making and stakeholder management, as they discuss the future of AI in data analytics and its transformative impact on businesses.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Lucas Thelosen and Drew Gilson about the engineering and impact of building an autonomous data analystInterview IntroductionHow did you get involved in the area of data management?Can you describe what Orion is and the story behind it?How do you envision the role of an agentic analyst in an organizational context?There have been several attempts at building LLM-powered data analysis, many of which are essentially a text-to-SQL interface. How have the capabilities and architectural patterns grown in the past ~2 years to enable a more capable system?One of the key success factors for a data analyst is their ability to translate business questions into technical representations. How can an autonomous AI-powered system understand the complex nuance of the business to build effective analyses?Many agentic approaches to analytics require a substantial investment in data architecture, documentation, and semantic models to be effective. What are the gradations of effectiveness for autonomous analytics for companies who are at different points on their journey to technical maturity?Beyond raw capability, there is also a significant need to invest in user experience design for an agentic analyst to be useful. What are the key interaction patterns that you have found to be helpful as you have developed your system?How does the introduction of a system like Orion shift the workload for data teams?Can you describe the overall system design and technical architecture of Orion?How has that changed as you gained further experience and understanding of the problem space?What are the most interesting, innovative, or unexpected ways that you have seen Orion used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Orion?When is Orion/agentic analytics the wrong choice?What do you have planned for the future of Orion?Contact Info LucasLinkedInDrewLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links OrionLookerGravityVBA == Visual Basic for ApplicationsText-To-SQLOne-shotLookMLData GrainLLM As A JudgeGoogle Large Time Series ModelThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

talk-data.com

Activity Trend

Top Events

Top Speakers

How Dun & Bradstreet Leverages Data Observability for Quality & Efficiency

Prizm Unleashed: The Future of Autonomous Data Management

The Data Product Marketplace: turning potential value into tangible outcomes

From pipelines automation to trusted agents: THE PATH TO HIGH DATA ROI

Welcome to Big Data LDN 2025

When Jason interviewed ChatGPT: Leadership in the age of AI

From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture

Apache Polaris: The Definitive Guide

A journey towards a better data strategy

Duck Lake: Simplifying the Lakehouse Ecosystem

Lessons Every Leader Can Use to Unlock Data and AI Success

#319 Building & Managing Human+Agent Hybrid Teams with Karen Ng, Head of Product at HubSpot

Data Modeling with Snowflake - Second Edition

Aligning Business and Data: The Essential Role of Data Modeling

From Academia to Industry: Bridging Data Engineering Challenges

How great leadership teams create sustainable business growth

High Performance And Low Overhead Graphs With KuzuDB

#316 Enterprise AI Agents with Jun Qian, VP of Generative AI Services at Oracle

How to Rebuild Data Trust? Mindful Data Strategy and Maintenance vs Innovation - Lior Barak

Bridging Data and Decision-Making: AI's Role in Modern Analytics