talk-data.com talk-data.com

Topic

Data Management

data_governance data_quality metadata_management

1097

tagged

Activity Trend

88 peak/qtr
2020-Q1 2026-Q1

Activities

1097 activities · Newest first

Are you struggling to gain leadership support, craving stakeholder engagement, and begging for proper funding? Even though you may create analytic Gen AI wonders with your data, it won’t matter unless you explain the value in practical business terms. Join The Data Whisperer’s rollicking and riotous review of current buzzwords and some practical tips to help you bridge the story gap between data and the business. n this session, you’ll learn:

• How to differentiate between a data management narrative and other data storytelling efforts 

• Strategies to secure executive sponsorship and ongoing funding 

• The 3Vs of Data Storytelling for effective Data Management

In this session, Chad Sanderson, CEO of Gable.ai and author of the upcoming O’Reilly book: "Data Contracts," tackles the necessity of modern data management in an age of hyper iteration, experimentation, and AI. He will explore why traditional data management practices fail and how the cloud has fundamentally changed data development. The talk will cover a modern application of data management best practices, including data change detection, data contracts, observability, and CI/CD tests, and outline the roles of data producers and consumers. 

Attendees will leave with a clear understanding of modern data management's components and how to leverage them for better data handling and decision-making.

The quality and usability of data determine the success of data-driven projects, and it has never been more critical to establish an operational pipeline of high-quality data that is both secure and accessible. Once distinct disciplines, Data Governance, Master Data Management, and Generative AI have converged to deliver data that is insight- and AI-ready in record time. Join our experts for practical examples and actionable advice you can use to get started.

Open table formats such as Apache Iceberg, Delta Lake, and Apache Hudi have dramatically transformed the data management landscape by enabling high-speed operations on massive datasets stored in object stores while maintaining ACID guarantees.

In this talk, we will explore the evolution and future of dataset versioning in the context of open table formats. Open table formats introduced the concept of table-level versioning and have become widely adopted standards. Data versioning systems that have emerged more recently, bringing best practices from software engineering into the data ecosystem, enable the management of multiple datasets within a large-scale data repository using Git-like semantics. Data versioning systems operate at the file level and are compatible with any open table format. On top of this, new catalogs that support these table formats and add a layer of access control are becoming the standard way to manage tabular datasets.

Despite these advancements, there remains a significant gap between current data versioning practices and the requirements for effective tabular dataset versioning.

The session will introduce the concept of a versioned catalog as a solution, demonstrating how it provides comprehensive data and metadata versioning for tables.

We’ll cover key requirements of tabular dataset management, including:

  • Capturing multi-table changes as single logical operations
  • Enabling seamless rollbacks without identifying each affected table
  • Implementing table format-aware versioning operations such as diff and merge

Join us to explore the future of dataset versioning in the era of open table formats and evolving data management practices!

In the next five years, we are poised to witness a significant transformation towards modern data lake architecture across industries. This shift is driven by an urgent need for a unified, flexible, and scalable data management solution. Such a solution must address the challenges of siloed data environments and the increasing complexity of data sources while balancing the benefits of data mesh principles with centralized governance and semantic consistency.

In this talk, we will cover latest trends and benefits in this field, as well as usage of open formats like Iceberg, lower costs of data movement, & multiple engines to support different workloads that ultimately helps in getting into a single source of truth.

Explore the symbiotic relation- ship between AI and data products. Discover how these advancements democratize data access, empowering Data Leaders to bridge the gap between IT, Data Owners, and Business Users. Gain insights on how to convert raw data into actionable insights for enterprise-scale operations. Learn strategies to leverage data as a strategic asset, overcoming challenges like data silos, guiding organizations towards sustainable data use, and unlocking transformative power in the AI era.

Data Observability is the new frontier of modern data management. Leading enterprises rely on Acceldata to ensure data quality, streamline operations, optimize costs, and maintain compliance. Join us to discover how to implement Enterprise Data Observability across on-prem and cloud environments, creating a single source of truth for data leaders, engineers, scientists, and business users. Learn from a seasoned industry expert who has successfully operationalized data governance at petabyte scale and delivers reliable data for AI and analytics initiatives.

In an era where data is the lifeblood of innovation, building modular, distributed, composable, and sustainable socio-technological architectures is essential. Managing data alone is not enough; we must also manage the domain-specific knowledge required to interpret, use, and integrate it effectively to support business strategy objectives. This talk explores a paradigm shift that integrates advanced knowledge frameworks into data management practices, enabling organizations to transform raw data into actionable insights with unprecedented accuracy and speed.

We will will explore the core principles of knowledge-driven data management, including the utilization of artificial intelligence and machine learning to enhance data processing, the implementation of semantic technologies to improve data interoperability, and the adoption of knowledge graphs to create a more connected and intelligent data ecosystem.

Join us to discover how embracing a knowledge-driven approach to data management can empower your organization to harness the full potential of its data, driving innovation and achieving strategic objectives in today’s VUCA world.

How about a workplace where generative AI accelerates every data management task, transforming routine into innovative experiences? A vision which can be in production for the AWS customers in just 60 days through a combination of Amazon Bedrock, which enables rapid development and deployment of AI applications, and Stratio Generative AI Data Fabric, which provides accurate output based on quality data with business meaning. Join us to learn how a combination of these products is empowering data managers and chief data officers to drive innovation and efficiency across their organizations. 

As Generative AI continues to revolutionize industries, having high-quality, well-prepared data has never been more crucial. In this session, Emma McGrattan, SVP of Engineering & Product at Actian, and Guillaume Bodet, CPTO at Zeenea, will explore how Zeenea's cutting-edge Data Discovery Platform, now part of Actian, is poised to play a pivotal role in achieving data readiness for GenAI. Attendees will discover how Zeenea’s metadata management solutions, including its comprehensive data catalog, lineage insights, quality index, business glossary, and data marketplace, empower organizations to truly know and trust their data. Join us to learn how to leverage these tools to mitigate risks, ensure compliance, and confidently unlock the full potential of GenAI in your organization. Don’t miss this opportunity to prepare your data for the next wave of AI innovation! Speaker Bios: Emma McGrattan, SVP of Engineering & Product, Actian Emma is SVP of Engineering and Product at Actian leading global research and development. She is a recognized authority in data management and analytics technologies and holds multiple patents. Emma has over two decades of experience leading a global software development organization focused on innovation in high-performance analytics, data management, integration, and application development technologies. Prior to joining Actian, Emma was Vice President for Ingres at Computer Associates. Educated in Ireland, Emma holds a Bachelor of Electrical Engineering degree from Dublin City University.

In this short presentation, Big Data LDN Conference Chairman and Europe’s leading IT Industry Analyst in Data Management and Analytics, Mike Ferguson, will welcome everyone to Big Data LDN 2024. He will also summarise where companies are in data, analytics and AI in 2024, what the key challenges and trends are, how are these trends impacting on how companies build a data-driven enterprise and where you can find out more about these at the show.

In this episode, host Jason Foster sits down with Anthony Deighton, CEO at Tamr, to delve into the complexities of data quality and analytics. They explore the challenges organisations face in managing and improving data quality, the pivotal role of AI in addressing these challenges, and strategies for aligning data quality initiatives with business objectives. They also explore the evolving role of central data teams, led by Chief Data Officers, in spearheading enterprise-wide data quality initiatives and how businesses can effectively tackle key challenges.


Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. They work with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and change management and leadership. The company was named one of The Sunday Times' fastest-growing private companies in 2022 and 2023 and named the Best Place to Work in Data by DataIQ in 2023.

Summary As data architectures become more elaborate and the number of applications of data increases, it becomes increasingly challenging to locate and access the underlying data. Gravitino was created to provide a single interface to locate and query your data. In this episode Junping Du explains how Gravitino works, the capabilities that it unlocks, and how it fits into your data platform. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementYour host is Tobias Macey and today I'm interviewing Junping Du about Gravitino, an open source metadata service for a unified view of all of your schemasInterview IntroductionHow did you get involved in the area of data management?Can you describe what Gravitino is and the story behind it?What problems are you solving with Gravitino?What are the methods that teams have relied on in the absence of Gravitino to address those use cases?What led to the Hive Metastore being the default for so long?What are the opportunities for innovation and new functionality in the metadata service?The documentation suggests that Gravitino has overlap with a number of tool categories such as table schema (Hive metastore), metadata repository (Open Metadata), data federation (Trino/Alluxio). What are the capabilities that it can completely replace, and which will require other systems for more comprehensive functionality?What are the capabilities that you are explicitly keeping out of scope for Gravitino?Can you describe the technical architecture of Gravitino?How have the design and scope evolved from when you first started working on it?Can you describe how Gravitino integrates into an overall data platform?In a typical day, what are the different ways that a data engineer or data analyst might interact with Gravitino?One of the features that you highlight is centralized permissions management. Can you describe the access control model that you use for unifying across underlying sources?What are the most interesting, innovative, or unexpected ways that you have seen Gravitino used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Gravitino?When is Gravitino the wrong choice?What do you have planned for the future of Gravitino?Contact Info LinkedInGitHubParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links GravitinoHadoopDatastratoPyTorchRayData FabricHiveIcebergPodcast EpisodeHive MetastoreTrinoOpenMetadataPodcast EpisodeAlluxioAtlanPodcast EpisodeSparkThriftThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

In this brilliant episode, Jason Foster delves into the transformative impact of artificial intelligence on businesses and everyday life with Tom Goodwin, a world-renowned trends and transformation expert, keynote speaker, consultant, and author. Tom shares his expert insights on how technology reshapes the rules of business, creates new possibilities, and influences consumer behaviour. From discussing the hype surrounding AI to exploring its practical applications and the importance of adapting to technological advancements, this conversation offers a comprehensive look at the future of AI in the corporate world and beyond.


Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. They work with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and change management and leadership. The company was named one of The Sunday Times' fastest-growing private companies in 2022 and 2023 and named the Best Place to Work in Data by DataIQ in 2023.

The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. We've released a special edition series of minisodes of our podcast. Recorded live at Data Connect 2024, our host Michael Toland engages in short, sweet, informative, and delightful conversations with five prevelant practitioners who are forging their way forward in data and technology. Recorded on Day 2 of Data Connect 2024, Michael sits down with Vishaka Gupta-Cledat, CEO and co-founder of Aperture, a spin-off from Intel. They explore Aperture's mission to simplify the work of data scientists, data engineers, and machine learning teams. About our host Michael Toland: Michael is a Product Management Coach and Consultant with Pathfinder Product, a Test Double Operation. Since 2016, Michael has worked on large-scale system modernizations and migration initiatives at Verizon. Outside his professional career, Michael serves as the Treasurer for the New Leaders Council, mentors with Venture for America, sings with the Columbus Symphony, and writes satire for his blog Dignified Product. He is excited to discuss data product management with the podcast audience. About our guest Vishaka Gupta-Cledat: Vishaka is the Co-founder and CEO of ApertureData. Before launching ApertureData, she spent over seven years at Intel Labs, where she led the design and development of VDMS (the Visual Data Management System), which is now the foundation of ApertureData’s flagship product, ApertureDB. Her expertise spans diverse areas, including scheduling in heterogeneous multi-core environments, graph-based storage, applications on non-volatile memory systems, and tackling visual data management challenges for analytics use cases. Connect with Vishaka on LinkedIn.  All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate a practitioner.  Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!

Join Jason Foster as he chats with Lara Burns, Chief Digital Officer at Scouts, about their ambitious digital transformation journey. Lara shares the challenges and successes in modernising one of the world's largest youth organisations, highlighting the critical role of digital tools in boosting volunteer engagement and operational efficiency. The episode also explores the emerging role of AI in enhancing data insights and the Scouts' commitment to educating young people on digital citizenship in a rapidly evolving technological landscape.


Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. They work with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and change management and leadership. The company was named one of The Sunday Times' fastest-growing private companies in 2022 and 2023 and named the Best Place to Work in Data by DataIQ in 2023.

The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. We've released a special edition series of minisodes of our podcast. Recorded live at Data Connect 2024, our host Michael Toland engages in short, sweet, informative, and delightful conversations with five prevelant practitioners who are forging their way forward in data and technology. In this minisode, Michael reconnects with his former colleague Lindsay Murphy as she delves into a crucial yet often overlooked aspect of data management—cost containment. Lindsay's session at Data Connect 2024 emphasizes the importance of considering costs as a critical piece of your data team's ROI. While data teams often focus on value creation and return on investment, they can easily lose sight of the expenses associated with the complex stacks they build. Lindsay offers practical insights on how to strike a balance between innovation and cost-efficiency. Plus, a special shout-out to Lindsay's new podcast, Women Lead Data—hurrah! This podcast is set to inspire and empower women in the data industry, providing a platform for sharing experiences, insights, and strategies for success. About our host Michael Toland: Michael is a Product Management Coach and Consultant with Pathfinder Product, a Test Double Operation. Since 2016, Michael has worked on large-scale system modernizations and migration initiatives at Verizon. Outside his professional career, Michael serves as the Treasurer for the New Leaders Council, mentors with Venture for America, sings with the Columbus Symphony, and writes satire for his blog Dignified Product. He is excited to discuss data product management with the podcast audience. Connect with Michael on LinkedIn About our guest Lindsay Murphy: Lindsay is a data leader with 13 years of experience in building and scaling data teams. She has successfully launched and led data initiatives at startups such as BenchSci, Maple, and Secoda. Her expertise includes developing internal data products, implementing modern data stack infrastructures, building and mentoring data engineering teams, and crafting data strategies that align with organizational goals. An active member of the data community, Lindsay organizes the Toronto Modern Data Stack Meetup group, which boasts over 2,500 members. She has also taught Advanced dbt to more than 100 students through Uplimit and hosts a weekly podcast, Women Lead Data, where she shares insights and amplifies the voices of women in the data industry. Connect with Lindsay on LinkedIn.  All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate a practitioner.  Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!

Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission to simplify the lives of data engineers. Chris explains the challenges faced by data engineers, such as constant system failures, the need for rapid changes, and high customer demands. Chris delves into the concept of DataOps, its evolution, and the misappropriation of related terms like data mesh and data observability. He emphasizes the importance of focusing on processes and systems rather than just tools to improve data engineering workflows. Chris also introduces DataKitchen's open-source tools, DataOps TestGen and DataOps Observability, designed to automate data quality validation and monitor data journeys in production. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.Your host is Tobias Macey and today I'm interviewing Chris Bergh about his tireless quest to simplify the lives of data engineersInterview IntroductionHow did you get involved in the area of data management?Can you describe what DataKitchen is and the story behind it?You helped to define and popularize "DataOps", which then went through a journey of misappropriation similar to "DevOps", and has since faded in use. What is your view on the realities of "DataOps" today?Out of the popularized wave of "DataOps" tools came subsequent trends in data observability, data reliability engineering, etc. How have those cycles influenced the way that you think about the work that you are doing at DataKitchen?The data ecosystem went through a massive growth period over the past ~7 years, and we are now entering a cycle of consolidation. What are the fundamental shifts that we have gone through as an industry in the management and application of data?What are the challenges that never went away?You recently open sourced the dataops-testgen and dataops-observability tools. What are the outcomes that you are trying to produce with those projects?What are the areas of overlap with existing tools and what are the unique capabilities that you are offering?Can you talk through the technical implementation of your new obserability and quality testing platform?What does the onboarding and integration process look like?Once a team has one or both tools set up, what are the typical points of interaction that they will have over the course of their workday?What are the most interesting, innovative, or unexpected ways that you have seen dataops-observability/testgen used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on promoting DataOps?What do you have planned for the future of your work at DataKitchen?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links DataKitchenPodcast EpisodeNASADataOps ManifestoData Reliability EngineeringData ObservabilitydbtDevOps Enterprise SummitBuilding The Data Warehouse by Bill Inmon (affiliate link)dataops-testgen, dataops-observabilityFree Data Quality and Data Observability CertificationDatabricksDORA MetricsDORA for dataThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA