talk-data.com talk-data.com

Topic

Data Management

data_governance data_quality metadata_management

1097

tagged

Activity Trend

88 peak/qtr
2020-Q1 2026-Q1

Activities

1097 activities · Newest first

D&A leaders must develop DataOps as an essential practice to redefine their data management operations. This involves establishing business value before pursuing significant data engineering initiatives, and preventing duplicated efforts undertaken by different teams in managing the common metadata, security and observability of information assets within the data platforms.

To achieve success, it’s essential to establish effective governance, standardise enterprise practices, and balance overall goals with the needs of individual business units. Governance provides a structured framework for analytics and decision-making, while standardisation enhances efficiency by establishing common practices. Together, these elements promote a culture of transparency and accountability, enabling organizations to adapt to change and drive sustainable growth.
In this session, I’ll share how Elsevier’s well-executed Master Data Management strategy helped us strike that balance.

Metadata, data quality and data observability tools provide significant capabilities to ensure good data for your BI and AI initiatives. Metadata tools help discover, and inventory your data assets. Data quality tools help business users manage their data at sources by setting rules and policies. Data observability tools give organizations integrated visibility over the health of data, data pipeline and data landscape. Together the tools help organizations lay good foundation in data management for BI and AI initiatives.

Data ecosystems, built on data fabric design and infused with AI, promise an integrated, cost effective, and operationally simple approach to varied data management challenges. However, they don't yet always deliver on that promise. This research explores the maturity of various ecosystem components and provides a guide for D&A leaders and others looking to invest in data foundations for competitive differentiation.

Learn how one of Europe's leading insurers is leveraging AI in it's Next Generation Data Governance strategy. This presentation reveals the essence of AI ready data, comprehensive metadata frameworks, product thinking and data excellence to power data governance and AI adoption at scale.
Essential insights for data leaders seeking to combine transformative AI capabilities with enterprise-grade data management.

Data management continues to evolve and is increasingly becoming a dedicated function led by heads of data management. Gartner has been investigating what makes heads of data management successful. This session will discuss the key characteristics around organizational structures, operating models, architecture and technology to establish characteristics of successful data management.

Malcolm Hawker describes MDM as a ‘must have’, while Juan Sequeda has described it as a ‘fancy integration’. As many CDO’s use MDM to solve decades-old problems, others turn to data catalogs as a natural starting point in their data journeys. This divide highlights the difficulty CDO’s face when prioritizing data initiatives: should they start with data management, or governance? Come hear two data experts debate:

- MDM Build vs. Buy
- Where should CDOs prioritize? MDM or Catalogs?
- What role do data products play in this choice?

Discover how Data Mesh is transforming data management by decentralizing delivery and empowering business-driven D&A initiatives. You will find out what data mesh is, its benefits, and the most common challenges. We will provide a successful path based on the experience of early adopters, allowing you to avoid the most common pitfalls and adopt data mesh successfully.

Data integration is evergreen, serving as a foundation element for any resilient data management strategy.
This session gives guidance on:
1. Data engineering top practices: What are the best practices to improve data integration?
2. Technology trends: What are the trends guiding the data integration technology?
3. Prioritization: Which of these top practices would prove to be the most impactful to your organization, given your current level of maturity?

Is D&A governance another data management initiative or is it more? Join us in this two-speaker debate session that clarifies the differences and synergies between the two practices, and why D&A leaders should care about making that differentiation. You will learn the best practices to complement D&A governance with data management practice.

Urgent Investments in data, analytics and AI use cases has put the spotlight once more on strong data management foundations. Is our Data even Ready for upcoming AI, analytics and data sharing initiatives is now top of mindshare for heads of data, CDAOs and their counterparts. Data Fabrics have emerged as a long term, foundational data management architecture that you should now pursue for sustained D&A success. This session will:
1. Help understand what data Fabrics are and what they mean for your data strategy and architecture
2. Help decide how to build and where to buy
3. Navigate the vendor landscape to assist in tech procurement decisions to aid your fabric journey

Three out of four companies are betting big on AI – but most are digging on shifting ground. In this $100 billion gold rush, none of these investments will pay off without data quality and strong governance – and that remains a challenge for many organizations. Not every enterprise has a solid data governance practice and maturity models vary widely. As a result, investments in innovation initiatives are at risk of failure. What are the most important data management issues to prioritize? See how your organization measures up and get ahead of the curve with Actian.

Summary In this episode of the Data Engineering Podcast Sida Shen, product manager at CelerData, talks about StarRocks, a high-performance analytical database. Sida discusses the inception of StarRocks, which was forked from Apache Doris in 2020 and evolved into a high-performance Lakehouse query engine. He explains the architectural design of StarRocks, highlighting its capabilities in handling high concurrency and low latency queries, and its integration with open table formats like Apache Iceberg, Delta Lake, and Apache Hudi. Sida also discusses how StarRocks differentiates itself from other query engines by supporting on-the-fly joins and eliminating the need for denormalization pipelines, and shares insights into its use cases, such as customer-facing analytics and real-time data processing, as well as future directions for the platform.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Sida Shen about StarRocks, a high performance analytical database supporting shared nothing and shared data patternsInterview IntroductionHow did you get involved in the area of data management?Can you describe what StarRocks is and the story behind it?There are numerous analytical databases on the market. What are the attributes of StarRocks that differentiate it from other options?Can you describe the architecture of StarRocks?What are the "-ilities" that are foundational to the design of the system?How have the design and focus of the project evolved since it was first created?What are the tradeoffs involved in separating the communication layer from the data layers?The tiered architecture enables the shared nothing and shared data behaviors, which allows for the implementation of lakehouse patterns. What are some of the patterns that are possible due to the single interface/dual pattern nature of StarRocks?The shared data implementation has cacheing built in to accelerate interaction with datasets. What are some of the limitations/edge cases that operators and consumers should be aware of?StarRocks supports management of lakehouse tables (Iceberg, Delta, Hudi, etc.), which overlaps with use cases for Trino/Presto/Dremio/etc. What are the cases where StarRocks acts as a replacement for those systems vs. a supplement to them?The other major category of engines that StarRocks overlaps with is OLAP databases (e.g. Clickhouse, Firebolt, etc.). Why might someone use StarRocks in addition to or in place of those techologies?We would be remiss if we ignored the dominating trend of AI and the systems that support it. What is the role of StarRocks in the context of an AI application?What are the most interesting, innovative, or unexpected ways that you have seen StarRocks used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on StarRocks?When is StarRocks the wrong choice?What do you have planned for the future of StarRocks?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links StarRocksCelerDataApache DorisSIMD == Single Instruction Multiple DataApache IcebergClickHousePodcast EpisodeDruidFireboltPodcast EpisodeSnowflakeBigQueryTrinoDatabricksDremioData LakehouseDelta LakeApache HiveC++Cost-Based OptimizerIceberg Summit Tencent Games PresentationApache PaimonLancePodcast EpisodeDelta UniformApache ArrowStarRocks Python UDFDebeziumPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

In this episode of Hub & Spoken, Jason Foster, CEO of Cynozure, goes solo and explores one of the most critical yet under-discussed business skills...decision making. Drawing on real-world examples, research, and personal experience, Jason unpacks why so many organisations struggle to make effective decisions, despite it being core to leadership, innovation, and progress. From decision paralysis to overconfidence, and data overload to gut instinct, he looks at the formal frameworks and informal dynamics that shape how choices are made across all levels of a business. The episode delves into the role of vision, bias, data literacy, and emotional intelligence, and outlines the essential skills leaders need to build confidence, clarity and adaptability into their decision-making culture. Whether you're leading a team, shaping strategy, or navigating change, this episode is packed with practical ideas to help you make better, faster, and more informed decisions. 🎧 Tune in now to rethink how decisions really get made, and how to make yours count. *****    Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation. 

SAS For Dummies, 3rd Edition

Become data-savvy with the widely used data and AI software Data and analytics are essential for any business, giving insight into what's working, what can be improved, and what else needs to be done. SAS software helps you make sure you're doing data right, with a host of data management, reporting, and analysis tools. SAS For Dummies teaches you the essentials, helping you navigate this statistical software and turn information into value. In this book, learn how to gather data, create reports, and analyze results. You'll also discover how SAS machine learning and AI can help deliver decisions based on data. Even if you're brand new to data and analytics, this easy-to-follow guide will turn you into an SAS power user. Become familiar with the most popular SAS applications, including SAS 9 and SAS Viya Connect to data, organize your information, and adopt sound data security practices Get a primer on working with data sets, variables, and statistical analysis Explore and analyze data through SAS programming and rich application interfaces Create and share graphs interactive visualizations to deliver insights This is the perfect Dummies guide for new SAS users looking to improve their skills—in any industry and for any organization size.

Summary In this episode of the Data Engineering Podcast Derek Collison, creator of NATS and CEO of Synadia, talks about the evolution and capabilities of NATS as a multi-paradigm connectivity layer for distributed applications. Derek discusses the challenges and solutions in building distributed systems, and highlights the unique features of NATS that differentiate it from other messaging systems. He delves into the architectural decisions behind NATS, including its ability to handle high-speed global microservices, support for edge computing, and integration with Jetstream for data persistence, and explores the role of NATS in modern data management and its use cases in industries like manufacturing and connected vehicles.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Derek Collison about NATS, a multi-paradigm connectivity layer for distributed applications.Interview IntroductionHow did you get involved in the area of data management?Can you describe what NATS is and the story behind it?How have your experiences in past roles (cloud foundry, TIBCO messaging systems) informed the core principles of NATS?What other sources of inspiration have you drawn on in the design and evolution of NATS? (e.g. Kafka, RabbitMQ, etc.)There are several patterns and abstractions that NATS can support, many of which overlap with other well-regarded technologies. When designing a system or service, what are the heuristics that should be used to determine whether NATS should act as a replacement or addition to those capabilities? (e.g. considerations of scale, speed, ecosystem compatibility, etc.)There is often a divide in the technologies and architecture used between operational/user-facing applications and data systems. How does the unification of multiple messaging patterns in NATS shift the ways that teams think about the relationship between these use cases?How does the shared communication layer of NATS with multiple protocol and pattern adaptaters reduce the need to replicate data and logic across application and data layers?Can you describe how the core NATS system is architected?How have the design and goals of NATS evolved since you first started working on it?In the time since you first began writing NATS (~2012) there have been several evolutionary stages in both application and data implementation patterns. How have those shifts influenced the direction of the NATS project and its ecosystem?For teams who have an existing architecture, what are some of the patterns for adoption of NATS that allow them to augment or migrate their capabilities?What are some of the ecosystem investments that you and your team have made to ease the adoption and integration of NATS?What are the most interesting, innovative, or unexpected ways that you have seen NATS used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on NATS?When is NATS the wrong choice?What do you have planned for the future of NATS?Contact Info GitHubLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links NATSNATS JetStreamSynadiaCloud FoundryTIBCOApplied Physics Lab - Johns Hopkins UniversityCray SupercomputerRVCM Certified MessagingTIBCO ZMSIBM MQJMS == Java Message ServiceRabbitMQMongoDBNodeJSRedisAMQP == Advanced Message Queueing ProtocolPub/Sub PatternCircuit Breaker PatternZero MQAkamaiFastlyCDN == Content Delivery NetworkAt Most OnceAt Least OnceExactly OnceAWS KinesisMemcachedSQSSegmentRudderstackPodcast EpisodeDLQ == Dead Letter QueueMQTT == Message Queueing Telemetry TransportNATS Kafka Bridge10BaseT NetworkWeb AssemblyRedPandaPodcast EpisodePulsar FunctionsmTLSAuthZ (Authorization)AuthN (Authentication)NATS Auth CalloutsOPA == Open Policy AgentRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeHome AssistantPodcast.init EpisodeTailscaleOllamaCDC == Change Data CapturegRPCThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Unlock Data Agility with Composable Data Architecture

Are your data systems slowing down your AI initiatives? The potential of AI to revolutionize business is undeniable, but many organizations struggle to bridge the gap between ambitious ideas and real-world results. The cause? Traditional data architectures remain too rigid and siloed to support today's dynamic, data-intensive demands. If you're a data leader searching for a solution, composable data architecture is the answer. This essential guide provides a clear, actionable framework for you to discover how this modular, adaptable approach empowers data teams, streamlines pipelines, and fuels continuous innovation. So, you'll not only keep pace with your most agile competitors—you'll surpass them. Understand the fundamental concepts that make composable architecture a game-changer Design pipelines that optimize performance and adapt to your organization's unique data needs See how composable architecture breaks down silos, enabling faster, more collaborative data processes Discover tools to streamline data management of high-volume streams or multicloud environments Leverage flexible architecture that simplifies data sharing, enabling easier access to insights

Summary In this episode of the Data Engineering Podcast Viktor Kessler, co-founder of Vakmo, talks about the architectural patterns in the lake house enabled by a fast and feature-rich Iceberg catalog. Viktor shares his journey from data warehouses to developing the open-source project, Lakekeeper, an Apache Iceberg REST catalog written in Rust that facilitates building lake houses with essential components like storage, compute, and catalog management. He discusses the importance of metadata in making data actionable, the evolution of data catalogs, and the challenges and innovations in the space, including integration with OpenFGA for fine-grained access control and managing data across formats and compute engines.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Viktor Kessler about architectural patterns in the lakehouse that are unlocked by a fast and feature-rich Iceberg catalogInterview IntroductionHow did you get involved in the area of data management?Can you describe what LakeKeeper is and the story behind it? What is the core of the problem that you are addressing?There has been a lot of activity in the catalog space recently. What are the driving forces that have highlighted the need for a better metadata catalog in the data lake/distributed data ecosystem?How would you characterize the feature sets/problem spaces that different entrants are focused on addressing?Iceberg as a table format has gained a lot of attention and adoption across the data ecosystem. The REST catalog format has opened the door for numerous implementations. What are the opportunities for innovation and improving user experience in that space?What is the role of the catalog in managing security and governance? (AuthZ, auditing, etc.)What are the channels for propagating identity and permissions to compute engines? (how do you avoid head-scratching about permission denied situations)Can you describe how LakeKeeper is implemented?How have the design and goals of the project changed since you first started working on it?For someone who has an existing set of Iceberg tables and catalog, what does the migration process look like?What new workflows or capabilities does LakeKeeper enable for data teams using Iceberg tables across one or more compute frameworks?What are the most interesting, innovative, or unexpected ways that you have seen LakeKeeper used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on LakeKeeper?When is LakeKeeper the wrong choice?What do you have planned for the future of LakeKeeper?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links LakeKeeperSAPMicrosoft AccessMicrosoft ExcelApache IcebergPodcast EpisodeIceberg REST CatalogPyIcebergSparkTrinoDremioHive MetastoreHadoopNATSPolarsDuckDBPodcast EpisodeDataFusionAtlanPodcast EpisodeOpen MetadataPodcast EpisodeApache AtlasOpenFGAHudiPodcast EpisodeDelta LakePodcast EpisodeLance Table FormatPodcast EpisodeUnity CatalogPolaris CatalogApache GravitinoPodcast Episode KeycloakOpen Policy Agent (OPA)Apache RangerApache NiFiThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA