talk-data.com talk-data.com

Topic

Data Management

data_governance data_quality metadata_management

1097

tagged

Activity Trend

88 peak/qtr
2020-Q1 2026-Q1

Activities

1097 activities · Newest first

ThredUp’s Journey with Databricks: Modernizing Our Data Infrastructure

Building an AI-ready data platform requires strong governance, performance optimization, and seamless adoption of new technologies. At ThredUp, our Databricks journey began with a need for better data management and evolved into a full-scale transformation powering analytics, machine learning, and real-time decision-making. In this session, we’ll cover: Key inflection points: Moving from legacy systems to a modernized Delta Lake foundation Unity Catalog’s impact: Improving governance, access control, and data discovery Best practices for onboarding: Ensuring smooth adoption for engineering and analytics teams What’s next? Serverless SQL and conversational analytics with Genie Whether you’re new to Databricks or scaling an existing platform, you’ll gain practical insights on navigating the transition, avoiding pitfalls, and maximizing AI and data intelligence.

Transforming Government With Data and AI: Singapore GovTech's Journey With Databricks

GovTech is an agency in the Singapore Government focused on tech for good. The GovTech Chief Data Office (CDO) has built the GovTech Data Platform with Databricks at the core. As the government tech agency, we safeguard national-level government and citizen data. A comprehensive data strategy is essential to uplifting data maturity. GovTech has adopted the service model approach where data services are offered to stakeholders based on their data maturity. Their maturity is uplifted through partnership, readying them for more advanced data analytics. CDO offers a plethora of data assets in a “data restaurant” ranging from raw data to data products, all delivered via Databricks and enabled through fine-grained access control, underpinned by data management best practices such as data quality, security and governance. Within our first year on Databricks, CDO was able to save 8,000 man-hours, democratize data across 50% of the agency and achieve six-figure savings through BI consolidation.

Data fabric and data mesh are not mutually exclusive. Join us in this two-speaker debate session that aims to put an end to a five-year long debate comparing fabric and mesh. D&A leaders can deploy them independently, or best-case together. You will find out how you can deploy the fabric design to unify data management and mesh operating model to distribute data management.

In a fragmented data landscape, reactive processes trap enterprise teams in firefighting mode, hindering innovation and scalability.Acceldata’s Agentic Data Management introduces a new paradigm, embedding proactive, AI-driven autonomy and cross-domain intelligence into data operations. By eliminating bottlenecks and reducing manual burden, it accelerates trustworthy insights and scalable governance. Join Acceldata to discover how forward-thinking enterprises are modernizing their data strategies to power innovation—and why autonomous operations are essential for thriving in an AI-first world.

Urgent Investments in data, analytics and AI use cases has put the spotlight once more on strong data management foundations. Is our Data even Ready for upcoming AI, analytics and data sharing initiatives is now top of mindshare for heads of data, CDAOs and their counterparts. Data Fabrics have emerged as a long term, foundational data management architecture that you should now pursue for sustained D&A success. This session will:
1. Help understand what data Fabrics are and what they mean for your data strategy and architecture
2. Help decide how to build and where to buy
3. Navigate the vendor landscape to assist in tech procurement decisions to aid your fabric journey

Productivity and operational efficiency are one of the key measures of business performance and economics. GenAI has promising capabilities of improving productivity and operational efficiency of data management function, and data governance. Organizations should explore and assess those capabilities to align it with strategic goals to improve the productivity and operational efficiency.

In today’s data-driven world, organizations are challenged to extract meaningful insights from complex, distributed information. A modern data intelligence platform brings together data management, AI/ML, and analytics to turn raw data into strategic advantage. This session explores how unified data architectures, augmented analytics, and intelligent applications are enabling smarter decisions and better business outcomes across industries. Real-world use cases—from demand forecasting to regulatory compliance—highlight the transformative impact of data intelligence. Powered by Oracle, this approach helps enterprises stay agile, informed, and competitive.

Data ecosystems, built on data fabric design and infused with AI, promise an integrated, cost effective, and operationally simple approach to varied data management challenges. However, they don't yet always deliver on that promise. This research explores the maturity of various ecosystem components and provides a guide for D&A leaders and others looking to invest in data foundations for competitive differentiation.

D&A leaders must develop DataOps as an essential practice to redefine their data management operations. This involves establishing business value before pursuing significant data engineering initiatives, and preventing duplicated efforts undertaken by different teams in managing the common metadata, security and observability of information assets within the data platforms.

Summary In this episode of the Data Engineering Podcast Mai-Lan Tomsen Bukovec, Vice President of Technology at AWS, talks about the evolution of Amazon S3 and its profound impact on data architecture. From her work on compute systems to leading the development and operations of S3, Mylan shares insights on how S3 has become a foundational element in modern data systems, enabling scalable and cost-effective data lakes since its launch alongside Hadoop in 2006. She discusses the architectural patterns enabled by S3, the importance of metadata in data management, and how S3's evolution has been driven by customer needs, leading to innovations like strong consistency and S3 tables.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th.Your host is Tobias Macey and today I'm interviewing Mai-Lan Tomsen Bukovec about the evolutions of S3 and how it has transformed data architectureInterview IntroductionHow did you get involved in the area of data management?Most everyone listening knows what S3 is, but can you start by giving a quick summary of what roles it plays in the data ecosystem?What are the major generational epochs in S3, with a particular focus on analytical/ML data systems?The first major driver of analytical usage for S3 was the Hadoop ecosystem. What are the other elements of the data ecosystem that helped shape the product direction of S3?Data storage and retrieval have been core primitives in computing since its inception. What are the characteristics of S3 and all of its copycats that led to such a difference in architectural patterns vs. other shared data technologies? (e.g. NFS, Gluster, Ceph, Samba, etc.)How does the unified pool of storage that is exemplified by S3 help to blur the boundaries between application data, analytical data, and ML/AI data?What are some of the default patterns for storage and retrieval across those three buckets that can lead to anti-patterns which add friction when trying to unify those use cases?The age of AI is leading to a massive potential for unlocking unstructured data, for which S3 has been a massive dumping ground over the years. How is that changing the ways that your customers think about the value of the assets that they have been hoarding for so long?What new architectural patterns is that generating?What are the most interesting, innovative, or unexpected ways that you have seen S3 used for analytical/ML/Ai applications?What are the most interesting, unexpected, or challenging lessons that you have learned while working on S3?When is S3 the wrong choice?What do you have planned for the future of S3?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AWS S3KinesisKafkaSQSEMRDrupalWordpressNetflix Blog on S3 as a Source of TruthHadoopMapReduceNasa JPLFINRA == Financial Industry Regulatory AuthorityS3 Object VersioningS3 Cross RegionS3 TablesIcebergParquetAWS KMSIceberg RESTDuckDBNFS == Network File SystemSambaGlusterFSCephMinIOS3 MetadataPhotoshop Generative FillAdobe FireflyTurbotax AI AssistantAWS Access AnalyzerData ProductsS3 Access PointAWS Nova ModelsLexisNexis ProtegeS3 Intelligent TieringS3 Principal Engineering TenetsThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

How AdventHealth and Data Flow Modernized Investment Data | The Data Apps Conference

AdventHealth’s investment team needed a better way to merge the need for real-time reporting for portfolio metrics and managing portfolio adjustments, moving beyond outdated, static reports that slowed decision-making and introduced inefficiencies.

In this session, Kelly Booth (Director of Data Strategy & Analytics, AdventHealth) and Jared Flores (Founder & Managing Director, Data Flow) will share how they partnered to modernize investment portfolio metrics reporting with a data app that enables real-time value adjustments, audit logging, and vendor tracking. This tool has helped:

Reduce manual errors by enabling real-time portfolio adjustments Save hours of work by eliminating back-and-forth data corrections Increase confidence in reporting with a fully auditable, write-back-enabled workflow Join us to see a live demo of the app in action and hear how AdventHealth uses Sigma Data Apps to drive efficiency and accuracy in investment data management.

➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps


➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial

sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture

It's now easier than ever for less technical users to access, manage and analyze data without needing help from IT. But, self-service data management isn't always straightforward, and there are plenty of pitfalls, like data quality issues, skills gaps and governance concerns. This session will cover practical ways to make self-service data management work.

Behavioural data is fast becoming a cornerstone of modern business strategy. Not just for media measurement or advertising optimisation, but across product, pricing, logistics, and platform development. It tells us what people actually do, not just what they say they do. As traditional market research struggles with low engagement and recall bias, brands are turning to digital behavioural data to make sharper, faster decisions. Whether it's tracking consumer journeys in the app economy or identifying early adoption trends (like the impact of AI tools on category disruption), the value lies in real, observable behaviour at scale. But, that shift raises new questions around data ownership, consent, and fairness. And, the rise of AI is only accelerating both the opportunity and the complexity. In the latest episode of Hub & Spoken, Jason Foster, CEO & Founder of Cynozure, speaks to Chris Havemann, CEO of RealityMine, and discusses everything from: The transition from survey-based research to behavioural data analysis The impact of AI on interpreting digital interactions Ethical considerations surrounding data consent and transparency Building trust through clear data collection and usage practices Learn from Chris's 25+ years in data and insight, and explore how behavioural signals are reshaping everything from media to market intelligence. ****    Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation. 

Summary In this episode of the Data Engineering Podcast Chakravarthy Kotaru talks about scaling data operations through standardized platform offerings. From his roots as an Oracle developer to leading the data platform at a major online travel company, Chakravarthy shares insights on managing diverse database technologies and providing databases as a service to streamline operations. He explains how his team has transitioned from DevOps to a platform engineering approach, centralizing expertise and automating repetitive tasks with AWS Service Catalog. Join them as they discuss the challenges of migrating legacy systems, integrating AI and ML for automation, and the importance of organizational buy-in in driving data platform success.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th.Your host is Tobias Macey and today I'm interviewing Chakri Kotaru about scaling successful data operations through standardized platform offeringsInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the different ways that you have seen teams you work with fail due to lack of structure and opinionated design?Why NoSQL?Pairing different styles of NoSQL for different problemsUseful patterns for each NoSQL style (document, column family, graph, etc.)Challenges in platform automation and scaling edge casesWhat challenges do you anticipate as a result of the new pressures as a result of AI applications?What are the most interesting, innovative, or unexpected ways that you have seen platform engineering practices applied to data systems?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data platform engineering?When is NoSQL the wrong choice?What do you have planned for the future of platform principles for enabling data teams/data applications?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links RiakDynamoDBSQL ServerCassandraScyllaDBCAP TheoremTerraformAWS Service CatalogBlog PostThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Summary In this episode of the Data Engineering Podcast, host Tobias Macy welcomes back Shinji Kim to discuss the evolving role of semantic layers in the era of AI. As they explore the challenges of managing vast data ecosystems and providing context to data users, they delve into the significance of semantic layers for AI applications. They dive into the nuances of semantic modeling, the impact of AI on data accessibility, and the importance of business logic in semantic models. Shinji shares her insights on how SelectStar is helping teams navigate these complexities, and together they cover the future of semantic modeling as a native construct in data systems. Join them for an in-depth conversation on the evolving landscape of data engineering and its intersection with AI.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Shinji Kim about the role of semantic layers in the era of AIInterview IntroductionHow did you get involved in the area of data management?Semantic modeling gained a lot of attention ~4-5 years ago in the context of the "modern data stack". What is your motivation for revisiting that topic today?There are several overlapping concepts – "semantic layer," "metrics layer," "headless BI." How do you define these terms, and what are the key distinctions and overlaps?Do you see these concepts converging, or do they serve distinct long-term purposes?Data warehousing and business intelligence have been around for decades now. What new value does semantic modeling beyond practices like star schemas, OLAP cubes, etc.?What benefits does a semantic model provide when integrating your data platform into AI use cases?How is it different between using AI as an interface to your analytical use cases vs. powering customer facing AI applications with your data?Putting in the effort to create and maintain a set of semantic models is non-zero. What role can LLMs play in helping to propose and construct those models?For teams who have already invested in building this capability, what additional context and metadata is necessary to provide guidance to LLMs when working with their models?What's the most effective way to create a semantic layer without turning it into a massive project? There are several technologies available for building and serving these models. What are the selection criteria that you recommend for teams who are starting down this path?What are the most interesting, innovative, or unexpected ways that you have seen semantic models used?What are the most interesting, unexpected, or challenging lessons that you have learned while working with semantic modeling?When is semantic modeling the wrong choice?What do you predict for the future of semantic modeling?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SelectStarSun MicrosystemsMarkov Chain Monte CarloSemantic ModelingSemantic LayerMetrics LayerHeadless BICubePodcast EpisodeAtScaleStar SchemaData VaultOLAP CubeRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeKNN == K-Nearest NeighbersHNSW == Hierarchical Navigable Small Worlddbt Metrics LayerSoda DataLookMLHexPowerBITableauSemantic View (Snowflake)Databricks GenieSnowflake Cortex AnalystMalloyThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

In this episode of Hub & Spoken, Jason Foster speaks with Colin Zima, CEO and Co-founder of Omni, a modern business intelligence platform that combines the best of governance and usability. With a background spanning roles at Looker and Google, and two decades as both a data user and builder, Colin brings a unique perspective on the evolution of BI and the real role of AI in shaping its future. They explore why business intelligence remains critical for aligning organisations, how AI is raising the bar for access and self-service, and why semantics and business logic are more important than ever. The conversation challenges the notion that AI will replace dashboards, and instead focuses on how it can enhance accessibility, support different user needs, and empower data teams to work more efficiently. This episode is essential listening for business and data leaders thinking about the future of BI, the practical use of AI, and the role data teams play in delivering real value at speed. Tune in to hear how modern BI is evolving, and what leaders need to know to stay ahead. ****    Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation. 

Data fabric and data mesh are not mutually exclusive. Join this session to learn about how D&A leaders can put an end to a five-year long debate comparing fabric and mesh. You can deploy them independently, or best-case together. You will find out how you can deploy the fabric design to unify data management and mesh operating model to distribute data management in a sensible manner.

For over a decade, we have sought a holistic, unifying theory of data management. This presentation documents the quest, and touches on the data and analytics infrastructure model (DAIM), metadata, data fabric, data ecosystems, and FinOps. Each of these is required and together they address everything from infrastructure to AI to strategy communications.

Summary In this episode of the Data Engineering Podcast Tulika Bhatt, a senior software engineer at Netflix, talks about her experiences with large-scale data processing and the future of data engineering technologies. Tulika shares her journey into the data engineering field, discussing her work at BlackRock and Verizon before joining Netflix, and explains the challenges and innovations involved in managing Netflix's impression data for personalization and user experience. She highlights the importance of balancing off-the-shelf solutions with custom-built systems using technologies like Spark, Flink, and Iceberg, and delves into the complexities of ensuring data quality and observability in high-speed environments, including robust alerting strategies and semantic data auditing.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Tulika Bhatt about her experiences working on large scale data processing and her insights on the future trajectory of the supporting technologiesInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the ways that operating at large scale change the ways that you need to think about the design of data systems?When dealing with small-scale data systems it can be feasible to have manual processes. What are the elements of large scal data systems that demand autopmation?How can those large-scale automation principles be down-scaled to the systems that the rest of the world are operating?A perennial problem in data engineering is that of data quality. The past 4 years has seen a significant growth in the number of tools and practices available for automating the validation and verification of data. In your experience working with high volume data flows, what are the elements of data validation that are still unsolved?Generative AI has taken the world by storm over the past couple years. How has that changed the ways that you approach your daily work?What do you see as the future realities of working with data across various axes of large scale, real-time, etc.?What are the most interesting, innovative, or unexpected ways that you have seen solutions to large-scale data management designed?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data management across axes of scale?What are the ways that you are thinking about the future trajectory of your work??Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links BlackRockSparkFlinkKafkaCassandraRocksDBNetflix Maestro workflow orchestratorPagerdutyIcebergThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA