talk-data.com talk-data.com

Topic

Master Data Management

data_governance data_quality data_integration

41

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

41 activities · Newest first

Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems, only 50% trust their organization's data overall. Ariel explains why truly productionizing AI demands broader, continuously refreshed data with stronger automation and governance, and highlights the challenges posed by unstructured data and vector stores. The conversation covers the need to shift from manual reviews to automated pipelines, the resurgence of metadata and master data management, and the importance of guardrails, traceability, and agent governance. Ariel also predicts a growing convergence between data teams and application integration teams and advises leaders to focus on high-value use cases, aggressive pipeline automation, and cataloging and governing the coming sprawl of AI agents, all while using AI to accelerate data engineering itself.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Ariel Pohoryles about data management investments that organizations are making to enable them to scale AI implementationsInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the motivation and scope of your recent survey on data management investments for AI across your respondents?What are the key takeaways that were most significant to you?The survey reveals a fascinating paradox: 77% of leaders trust the data used by their AI systems, yet only half trust their organization's overall data quality. For our data engineering audience, what does this suggest about how companies are currently sourcing data for AI? Does it imply they are using narrow, manually-curated "golden datasets," and what are the technical challenges and risks of that approach as they try to scale?The report highlights a heavy reliance on manual data quality processes, with one expert noting companies feel it's "not reliable to fully automate validation" for external or customer data. At the same time, maturity in "Automated tools for data integration and cleansing" is low, at only 42%. What specific technical hurdles or organizational inertia are preventing teams from adopting more automation in their data quality and integration pipelines?There was a significant point made that with generative AI, "biases can scale much faster," making automated governance essential. From a data engineering perspective, how does the data management strategy need to evolve to support generative AI versus traditional ML models? What new types of data quality checks, lineage tracking, or monitoring for feedback loops are required when the model itself is generating new content based on its own outputs?The report champions a "centralized data management platform" as the "connective tissue" for reliable AI. How do you see the scale and data maturity impacting the realities of that effort?How do architectural patterns in the shape of cloud warehouses, lakehouses, data mesh, data products, etc. factor into that need for centralized/unified platforms?A surprising finding was that a third of respondents have not fully grasped the risk of significant inaccuracies in their AI models if they fail to prioritize data management. In your experience, what are the biggest blind spots for data and analytics leaders?Looking at the maturity charts, companies rate themselves highly on "Developing a data management strategy" (65%) but lag significantly in areas like "Automated tools for data integration and cleansing" (42%) and "Conducting bias-detection audits" (24%). If you were advising a data engineering team lead based on these findings, what would you tell them to prioritize in the next 6-12 months to bridge the gap between strategy and a truly scalable, trustworthy data foundation for AI?The report states that 83% of companies expect to integrate more data sources for their AI in the next year. For a data engineer on the ground, what is the most important capability they need to build into their platform to handle this influx?What are the most interesting, innovative, or unexpected ways that you have seen teams addressing the new and accelerated data needs for AI applications?What are some of the noteworthy trends or predictions that you have for the near-term future of the impact that AI is having or will have on data teams and systems?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links BoomiData ManagementIntegration & Automation DemoAgentstudioData Connector Agent WebinarSurvey ResultsData GovernanceShadow ITPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

In this episode of Data Unchained, we sit down with Malcolm Hawker, former Gartner analyst and Chief Data Officer at Profisee, to expose the real barriers to AI adoption. We explore why Master Data Management (MDM) is the foundation enterprises overlook, how decentralized systems and unstructured data derail governance, and why CDOs must evolve their role or risk irrelevance. This conversation challenges the myth of a single source of truth, breaks down the politics of data ownership, and offers a new vision for aligning data strategy with AI innovation.

AIReadiness #MasterDataManagement #DataGovernance #CDOInsights #EnterpriseAI #DataStrategy #UnstructuredData #DataInfrastructure #DigitalTransformation #AILeadership #DataUnchained #Profisee #MalcolmHawker #MollyPresley #TechInnovation

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Sponsored by: Informatica | Power Analytics and AI on Databricks With Master (Golden) Record Data

Supercharge advanced analytics and AI insights on Databricks with accurate and consistent master data. This session explores how Informatica’s Master Data Management (MDM) integrates with Databricks to provide high-quality, integrated golden record data like customer, supplier, product 360 or reference data to support downstream analytics, Generative AI and Agentic AI. Enterprises can accelerate and de-risk the process of creating a golden record via a no-code/low-code interface, allowing data teams to quickly integrate siloed data and create a complete and consistent record that improves decision-making speed and accuracy.

Scaling Modern MDM With Databricks, Delta Sharing and Dun & Bradstreet

Master Data Management (MDM) is the foundation of a successful enterprise data strategy — delivering consistency, accuracy and trust across all systems that depend on reliable data. But how can organizations integrate trusted third-party data to enhance their MDM frameworks? How can they ensure that this master data is securely and efficiently shared across internal platforms and external ecosystems? This session explores how Dun & Bradstreet’s pre-mastered data serves as a single source of truth for customers, suppliers and vendors — reducing duplication and driving alignment across enterprise systems. With Delta Sharing, organizations can natively ingest Dun & Bradstreet data into their Databricks environment and establish a scalable, interoperable MDM framework. Delta Sharing also enables secure, real-time distribution of master data across the enterprise ensuring that every system operates from a consistent and trusted foundation.

To achieve success, it’s essential to establish effective governance, standardise enterprise practices, and balance overall goals with the needs of individual business units. Governance provides a structured framework for analytics and decision-making, while standardisation enhances efficiency by establishing common practices. Together, these elements promote a culture of transparency and accountability, enabling organizations to adapt to change and drive sustainable growth.
In this session, I’ll share how Elsevier’s well-executed Master Data Management strategy helped us strike that balance.

Summary In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organizational data. He explains how data silos arise from independent teams and highlights the importance of combining traditional techniques with modern AI to address the nuances of data reconciliation. Dan emphasizes the transformative potential of large language models (LLMs) in creating more natural user experiences, improving trust in AI-driven data solutions, and simplifying complex data management processes. He also discusses the balance between using AI for complex data problems and the necessity of human oversight to ensure accuracy and trust.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us don't miss Data Citizens® Dialogues, the forward-thinking podcast brought to you by Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. In every episode of Data Citizens® Dialogues, industry leaders unpack data’s impact on the world; like in their episode “The Secret Sauce Behind McDonald’s Data Strategy”, which digs into how AI-driven tools can be used to support crew efficiency and customer interactions. In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. The Data Citizens Dialogues podcast is bringing the data conversation to you, so start listening now! Follow Data Citizens Dialogues on Apple, Spotify, YouTube, or wherever you get your podcasts.Your host is Tobias Macey and today I'm interviewing Dan Bruckner about the application of ML and AI techniques to the challenge of reconciling data at the scale of businessInterview IntroductionHow did you get involved in the area of data management?Can you start by giving an overview of the different ways that organizational data becomes unwieldy and needs to be consolidated and reconciled?How does that reconciliation relate to the practice of "master data management"What are the scaling challenges with the current set of practices for reconciling data?ML has been applied to data cleaning for a long time in the form of entity resolution, etc. How has the landscape evolved or matured in recent years?What (if any) transformative capabilities do LLMs introduce?What are the missing pieces/improvements that are necessary to make current AI systems usable out-of-the-box for data cleaning?What are the strategic decisions that need to be addressed when implementing ML/AI techniques in the data cleaning/reconciliation process?What are the risks involved in bringing ML to bear on data cleaning for inexperienced teams?What are the most interesting, innovative, or unexpected ways that you have seen ML techniques used in data resolution?What are the most interesting, unexpected, or challenging lessons that you have learned while working on using ML/AI in master data management?When is ML/AI the wrong choice for data cleaning/reconciliation?What are your hopes/predictions for the future of ML/AI applications in MDM and data cleaning?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links TamrMaster Data ManagementCERNLHCMichael StonebrakerConway's LawExpert SystemsInformation RetrievalActive LearningThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

It’s a widely held belief that MDM programs are big, disruptive, risky, and prone to failure. While these things may have been true for some companies in the past, providing meaningful business value through the launch (or relaunch) of an MDM program can be done in under 90 days – if you take the right approach. Come listen to former Gartner MDM analyst Malcolm Hawker as he describes the keys to launching an MDM program in under 12 weeks:

- Taking an MVP (minimum viable product) approach to your MDM program

- The importance of choosing the right MDM implementation style 

- How to gain executive alignment and sponsorship 

- Staffing / resourcing an MDM program for speed

- Other practical lessons from the MDM school of hard knocks 

If you’re having trouble getting an MDM program off the ground, or if your existing program is failing to deliver business value, then you won’t want to miss this presentation from the leading expert in the field of Master Data Management. 

The quality and usability of data determine the success of data-driven projects, and it has never been more critical to establish an operational pipeline of high-quality data that is both secure and accessible. Once distinct disciplines, Data Governance, Master Data Management, and Generative AI have converged to deliver data that is insight- and AI-ready in record time. Join our experts for practical examples and actionable advice you can use to get started.

Increasing Data Trust: Enabling Data Governance on Databricks Using Unity Catalog & ML-Driven MDM

As part of Comcast Effectv’s transformation into a completely digital advertising agency, it was key to develop an approach to manage and remediate data quality issues related to customer data so that the sales organization is using reliable data to enable data-driven decision making. Like many organizations, Effectv's customer lifecycle processes are spread across many systems utilizing various integrations between them. This results in key challenges like duplicate and redundant customer data that requires rationalization and remediation. Data is at the core of Effectv’s modernization journey with the intended result of winning more business, accelerating order fulfillment, reducing make-goods and identifying revenue.

In partnership with Slalom Consulting, Comcast Effectv built a traditional lakehouse on Databricks to ingest data from all of these systems but with a twist; they anchored every engineering decision in how it will enable their data governance program.

In this session, we will touch upon the data transformation journey at Effectv and dive deeper into the implementation of data governance leveraging Databricks solutions such as Delta Lake, Unity Catalog and DB SQL. Key focus areas include how we baked master data management into our pipelines by automating the matching and survivorship process, and bringing it all together for the data consumer via DBSQL to use our certified assets in bronze, silver and gold layers.

By making thoughtful decisions about structuring data in Unity Catalog and baking MDM into ETL pipelines, you can greatly increase the quality, reliability, and adoption of single-source-of-truth data so your business users can stop spending cycles on wrangling data and spend more time developing actionable insights for your business.

Talk by: Maggie Davis and Risha Ravindranath

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Designed and implemented well, automated workflows can make the modern business just a little less chaotic and complex. This blog explores the opportunity for automated workflows to help cross-functional teams collaborate and standardize organizational master data. Published at: https://www.eckerson.com/articles/master-data-management-and-operational-workflows-two-modern-use-cases

Send us a text Part 2 : Malcolm Hawker, Head of Data Strategy for Profisee, and thought leader in the field of Master Data Management (MDM) and Data Governance. If you're an MDM zealot, let's go deep!

Show Notes: 00:32 The Head of Data Strategy02:00 Make it Easy, Accurate, and Scale05:48 What is Real vs Hype09:00 Profisee's Differentiator14:26 Reach out to Malcolm15:28 How to become a CDO18:50 Focusing on outcomes24:52 The end of the worldLinkedin: https://www.linkedin.com/in/malhawker Website: https://profisee.com/

Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Send us a text Part 1 : Welcome Malcolm Hawker, Head of Data Strategy for Profisee, and thought leader in the field of Master Data Management (MDM) and Data Governance. If you're an MDM zealot, let's go deep.

Show Notes 01:28 Meet Malcolm Hawker06:33 MDM's future09:48 A unique view on data fabric14:07 The rise and fall of AOL19:46 The definition of MDM26:28 MDM reference architectureLinkedin: https://www.linkedin.com/in/malhawker Website: https://profisee.com/

Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while  keeping it simple & fun. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Summary The most complicated part of data engineering is the effort involved in making the raw data fit into the narrative of the business. Master Data Management (MDM) is the process of building consensus around what the information actually means in the context of the business and then shaping the data to match those semantics. In this episode Malcolm Hawker shares his years of experience working in this domain to explore the combination of technical and social skills that are necessary to make an MDM project successful both at the outset and over the long term.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Random data doesn’t do it — and production data is not safe (or legal) for developers to use. What if you could mimic your entire production database to create a realistic dataset with zero sensitive data? Tonic.ai does exactly that. With Tonic, you can generate fake data that looks, acts, and behaves like production because it’s made from production. Using universal data connectors and a flexible API, Tonic integrates seamlessly into your existing pipelines and allows you to shape and size your data to the scale, realism, and degree of privacy that you need. The platform offers advanced subsetting, secure de-identification, and ML-driven data synthesis to create targeted test data for all of your pre-production environments. Your newly mimicked datasets are safe to share with developers, QA, data scientists—heck, even distributed teams around the world. Shorten development cycles, eliminate the need for cumbersome data pipeline work, and mathematically guarantee the privacy of your data, with Tonic.ai. Data Engineering Podcast listeners can sign up for a free 2-week sandbox account, go to dataengineeringpodcast.com/tonic today to give it a try! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure

COVID, inflation, broken supply chains, and not-so-distant war make this a turbulent time for the modern consumer. During times like these, families tend to their nests, which leads to lots of home-improvement projects…which means lots of painting.

Today we explore the case study of a Fortune 500 producer of the paints and stains that coat many households, consumer products, and even mechanical vehicles. While business expands, this company needs to carefully align the records that track hundreds of suppliers, thousands of storefronts, and millions of customers.

Business expansion and complex supply chains make it particularly important—and challenging—for enterprises such as this paint producer, which we’ll call Bright Colors, to accurately describe the entities that make up their business. They need to be governed, validated data to describe entities such as their products, locations, and customers. Master data management, also known as MDM, streamlines operations and assists data governance by reconciling disparate data records into golden records and ideally a single source of truth.

We’re excited to share our conversation with an industry expert that helps Bright Colors and other Fortune 2000 enterprises navigate turbulent times with effective strategies for MDM and data governance.

Dave Wilkinson is chief technology officer with D3Clarity, a global strategy and implementation services firm that seeks to ensure digital certainty, security, and trust. D3Clarity is a partner of Semarchy, whose Intelligent Data Hub software helps enterprises govern and manage master data, reference data, data quality, enrichment, and workflows. Semarchy sponsored this podcast.

Fast-casual restaurants offer a fascinating microcosm of the turbulent forces confronting enterprises today—and the pivotal role that data plays in helping them maintain competitive advantage. COVID prompted customers to order their Chipotle burritos, Shake Shack milkshakes, and Bruegger’s Bagels for home delivery, and this trend continues in 2022. Supply-chain disruptions, meanwhile, force fast-casual restaurants to make some fast pivots between suppliers in order to keep their shelves stocked. And the market continues to grow as these companies win customers, add locations, and expand delivery partnerships.

These three industry trends—home delivery, supply-chain disruptions, and market expansion—all depend on governed, accurate data to describe entities such as orders, ingredients, and locations. Data quality and master data management therefore play a more pivotal role than ever in the success of fast-casual restaurants. Master data management, also known as MDM, streamlines operations and assists data governance by reconciling disparate data records into a golden record and source of truth. If you’re looking for an ideal case study for how MDM drives enterprise reinvention, agility, and growth, this is it.

We’re excited to talk with an industry expert that helps fast-casual restaurants handle these turbulent forces with effective strategies for managing data and especially master data. Matt Zingariello is Vice President of Data Strategy Services with Keyrus, a global consultancy that helps enterprises use data assets to optimize their digital strategies and customer experience. Matt leads a team that provides industry-specific advisory and implementation services to help enterprises address challenges such as data governance and MDM.

Keyrus is a partner of Semarchy, whose Intelligent Data Hub software helps enterprises govern and manage master data, reference data, data quality, enrichment, and workflows. Semarchy sponsored this podcast.

In our podcast, we'll define data quality and MDM as part of data governance. We’ll explore why enterprises need data quality and MDM, and how they can craft effective data quality and MDM strategies, with a focus on fast-casual restaurants as a case study.

It’s hard to find a data discipline today that is under more pressure than data governance. One on side, the supply of data is exploding. As enterprises transform their business to compete in the 2020s, they digitize myriad events and interactions, which creates mountains of data that they need to control. On the other side, demand for data is exploding. Business owners at all levels of the enterprise need to inform their decisions and drive their operations with data.

Under these pressures, data governance teams must ensure business owners access and consume the right, high-quality data. This requires master data management—the reconciliation of disparate data records into a golden record and source of truth—which assists data governance at many modern enterprises.

In this episode, our host Kevin Petrie, VP of Research at Eckerson Group talks with our guests Felicia Perez, Managing Director, Information as a Product Program at National Student Clearinghouse, and Patrick O'Halloran, enterprise data scientist as they define what data quality and MDM are, why you need them, and how best to achieve effective data quality and MDM.