talk-data.com talk-data.com

Topic

Data Quality

data_management data_cleansing data_validation

537

tagged

Activity Trend

82 peak/qtr
2020-Q1 2026-Q1

Activities

537 activities · Newest first

AI is transforming industries, but its success hinges on one critical factor: high-quality, abundant data. From ingestion to consumption and modernization to validation, multiple AI agents work together to manage the data life cycle to have good data quality. Yet, as data growth slows down, lets understand how can we fuel growth with synthetic data. We’ll discuss where we are now, what comes next, and how to connect every dot in data modernization and migration.

This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.

Learn how Google Cloud is transforming data and AI governance with the latest governance and catalog innovations built directly into BigQuery. Unify metadata across platforms, boost discoverability, and accelerate insights with robust governance and security features. Discover how AI-powered insights, centralized governance, and integrated policy management can help improve data quality, track lineage, and manage access control, fostering trust and transparency in your data and AI initiatives.

Enhance your data ingestion architecture's resilience with Google Cloud's serverless solutions. Gain end-to-end visibility into your data's lineage—track each data point's transformation journey, including timestamps, user actions, and process outcomes. Implement real-time streaming and daily batch processes for Vertex AI Retail Search to deliver near real-time search capabilities while maintaining a daily backup for contingencies. Adopt best practices for data management, lineage tracking, and forensic capabilities to streamline issue diagnosis. This talk presents a scalable and fault-tolerant design that optimizes data quality and search performance while ensuring forensic-level traceability for every data movement.

Join this Solutions Talk to see Virgin Media O2's self-service data mesh on Google Cloud, built on data tooling and data platform engineering. We'll show their real-world architecture for discoverable, trusted data products, emphasizing data quality, contracts, and management. Hear VMO2's data mesh vision, why contracts and quality matter, and how Google Cloud enables automation, leaving you with actionable insights to unlock your data's potential.

Data Usability in the Enterprise: How Usability Leads to Optimal Digital Experiences

Ensuring data usability is paramount to unlocking a company’s full potential and driving informed decision-making. Part of author Saurav Bhattacharya’s trilogy that covers the essential pillars of digital ecosystems—security, reliability, and usability—this book offers a comprehensive exploration of the fundamental concepts, principles, and practices essential for enhancing data accessibility and effectiveness. You’ll study the core aspects of data design, standardization, and interoperability, gaining the knowledge needed to create and maintain high-quality data environments. By examining the tools and technologies that improve data usability, along with best practices for data visualization and user-centric strategies, this book serves as an invaluable resource for professionals seeking to leverage data more effectively. The book also addresses crucial governance issues, ensuring data quality, integrity, and security are maintained. Through a detailed analysis of data governance frameworks and privacy concerns, you’ll see how to manage data responsibly. Additionally, the book includes compelling case studies that highlight successful data usability implementations, future trends, and the challenges faced in achieving optimal data usability. By fostering a culture of data literacy and usability, this book will help you and your organization navigate the evolving data landscape and harness the power of data for innovation and growth. What You Will Learn Understand the fundamental concepts and importance of data usability, including effective data design, enhancing data accessibility, and ensuring data standardization and interoperability. Review the latest tools and technologies that enhance data usability, best practices for data visualization, and strategies for implementing user-centric data approaches. Ensure data quality and integrity, while navigating data privacy and security concerns. Implement robust data governance frameworks to manage data responsibly and effectively. Who This Book Is For Cybersecurity and IT professionals

Shifting Left in Banking: Enhancing Machine Learning Models through Proactive Data Quality | Abhi...

Shifting Left in Banking: Enhancing Machine Learning Models through Proactive Data Quality | Abhi Ghosh | Shift Left Data Conference 2025

Good Data and not Big Data is becoming more important in today's ecosystem. Machine Learning models rely on good quality data to make their model training more efficient and effective. We have traditionally applied Data Quality checks and balances in manual, centralized way, putting a lot of onus on our customers. Shifting Left Data Quality will bring the data quality checks closer to where data is being created, while preventing bad data from flowing downstream. Also auto-detecting, recommending and auto-enforcing data quality rules will make our customers job easier, while creating a more mature and robust data ecosystem.

The Rise of the Data-Conscious Software Engineer: Bridging the Data-Software Gap | Mark Freeman...

The Rise of the Data-Conscious Software Engineer: Bridging the Data-Software Gap | Mark Freeman | Shift Left Data Conference 2025

Data teams increasingly embrace software engineering practices to address quality and integration challenges, yet friction remains between software and data teams. This talk explores why standard practices alone aren’t enough and introduces the concept of the “Data-Conscious Software Engineer,” an emerging role critical to bridging these organizational divides. Attendees will learn how identifying and empowering engineers who deeply understand both software development and data workflows can foster stronger collaboration, improve data quality, and drive organizational change toward treating data as a strategic asset.

Shift Left with Apache Iceberg Data Products to Power AI | Andrew Madson | Shift Left Data Confer...

Shift Left with Apache Iceberg Data Products to Power AI | Andrew Madson | Shift Left Data Conference 2025

High-quality, governed, and performant data from the outset is vital for agile, trustworthy enterprise AI systems. Traditional approaches delay addressing data quality and governance, causing inefficiencies and rework. Apache Iceberg, a modern table format for data lakes, empowers organizations to "Shift Left" by integrating data management best practices earlier in the pipeline to enable successful AI systems.

This session covers how Iceberg's schema evolution, time travel, ACID transactions, and Git-like data branching allow teams to validate, version, and optimize data at its source. Attendees will learn to create resilient, reusable data assets, streamline engineering workflows, enforce governance efficiently, and reduce late-stage transformations—accelerating analytics, machine learning, and AI initiatives.

Panel: Shift Left Across the Data Lifecycle—Data Contracts, Transformations, Observability, and C...

Panel: Shift Left Across the Data Lifecycle—Data Contracts, Transformations, Observability, and Catalogs | Prukalpa Sankar, Tristan Handy, Barr Moses, Chad Sanderson | Shift Left Data Conference 2025

Join industry-leading CEOs Chad (Data Contracts), Tristan (Data Transformations), Barr (Data Observability), and Prukalpa (Data Catalogs) who are pioneering new approaches to operationalizing data by “Shifting Left.” This engaging panel will explore how embedding rigorous data management practices early in the data lifecycle reduces issues downstream, enhances data reliability, and empowers software engineers with clear visibility into data expectations. Attendees will gain insights into how data contracts define accountability, how effective transformations ensure data usability at scale, how proactive how proactive data and AI observability drives continuous confidence in data quality, and how catalogs enable data discoverability, accelerating innovation and trust across organizations.

Automating Data Quality via Shift Left for Real-Time Web Data Feeds at Industrial Scale | Sarah M...

Automating Data Quality via Shift Left for Real-Time Web Data Feeds at Industrial Scale | Sarah McKenna | Shift Left Data Conference 2025

Real-time web data is one of the hardest data streams to automate with trust since web sites don't want to be scraped, are constantly changing with no notice, and employ sophisticated bot blocking mechanisms to try to stop automated data collection. At Sequentum we cut our teeth on web data and have come out with a general purpose cloud platform for any type of data ingestion and data enrichment that our clients can transparently audit and ultimately trust to get their mission critical data delivered on time and with quality to fuel their business decision making.

Data Contracts in the Real World, the Adevinta Spain Implementation | Sergio Catoira | Shift Left...

Data Contracts in the Real World, the Adevinta Spain Implementation | Sergio Catoira | Shift Left Data Conference 2025

This talk covers Adevinta Spain's transition from a best-effort governance model to a governed data integration system by design. By creating source-aligned data products, this shift aims to enhance data quality and reliability from the moment data is ingested.

Shifting From Reactive to Proactive at Glassdoor | Zakariah Siyaji | Shift Left Data Conference 2025

Shifting From Reactive to Proactive at Glassdoor | Zakariah Siyaji | Shift Left Data Conference 2025

As Glassdoor scaled to petabytes of data, ensuring data quality became critical for maintaining trust and supporting strategic decisions. Glassdoor implemented a proactive, “shift left” strategy focused on embedding data quality practices directly into the development process. This talk will detail how Glassdoor leveraged data contracts, static code analysis integrated into the CI/CD pipeline, and automated anomaly detection to empower software engineers and prevent data issues at the source. Attendees will learn how proactive data quality management reduces risk, promotes stronger collaboration across teams, enhances operational efficiency, and fosters a culture of trust in data at scale.

Shifting Left with Data DevOps | Chad Sanderson | Shift Left Data Conference 2025

Data DevOps applies rigorous software development practices—such as version control, automated testing, and governance—to data workflows, empowering software engineers to proactively manage data changes and address data-related issues directly within application code. By adopting a "shift left" approach with Data DevOps, SWE teams become more aware of data requirements, dependencies, and expectations early in the software development lifecycle, significantly reducing risks, improving data quality, and enhancing collaboration.

This session will provide practical strategies for integrating Data DevOps into application development, enabling teams to build more robust data products and accelerate adoption of production AI systems.

Generative AI has transformed the financial services sector, sparking interest at all organizational levels. As AI becomes more accessible, professionals are exploring its potential to enhance their work. How can AI tools improve personalization and fraud detection? What efficiencies can be gained in product development and internal processes? These are the questions driving the adoption of AI as companies strive to innovate responsibly while maximizing value. Andrew serves as the Chief Data Officer for Mastercard, leading the organization’s data strategy and innovation efforts while navigating current and future data risks. Andrews's prior roles at Mastercard include Senior Vice President, Data Management, in which he was responsible for the quality, collection, and use of data for Mastercard’s information services and advisory business, and Mastercard’s Deputy Chief Privacy Officer, in which he was responsible for privacy and data protection issues globally for Mastercard. Andrew also spent many years as a Privacy & Intellectual Property Council advising direct marketing services, interactive advertising, and industrial chemicals industries. Andrew holds Juris Doctor from Columbia University School of Law and has his bachelor’s degree, cum laude, in Chemical Engineering from the University of Delaware. Andrew is a retired member of the State Bar of New York. In the episode, Adel and Andrew explore GenAI's transformative impact on financial services, the democratization of AI tools, efficiency gains in product development, the importance of AI governance and data quality, the cultural shifts and regulatory landscapes shaping AI's future, and much more. Links Mentioned in the Show: MastercardConnect with AndrewSkill Track: Artificial Intelligence (AI) LeadershipRelated Episode: How Generative AI is Changing Leadership with Christie Smith, Founder of the Humanity Institute and Kelly Monahan, Managing Director, Research InstituteSign up to attend RADAR: Skills Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

It’s time for another episode of the Data Engineering Central Podcast. In this episode, we cover … * AWS Lambda + DuckDB and Delta Lake (Polars, Daft, etc). * IAC - Long Live Terraform. * Databricks Data Quality with DQX. * Unity Catalog releases for DuckDB and Polars * Bespoke vs Managed Data Platforms * Delta Lake vs. Iceberg and UinFORM for a single table. Thanks for b…

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe

Nine years ago, June joined a well-known tech company and found an opportunity to strengthen trust: in the data, in the reports, and in how effective the data team could truly be. Since then, the cornerstone of her work has been building trust. June shares stories about improving data quality, aligning strategies, and advancing data literacy. Learn how trust can shape collaboration and impact across any organization, and how to become a true partner in decision-making and business growth.

Historically speaking, digital analytics has focused predominantly on client-side tracking, but recent shifts in regulations, privacy and technology have driven analysts towards server-side solutions - primarily server-side tag management. While server-side solutions are starting to be more widely considered, full server-side tracking remains an underutilized opportunity. This talk unpacks the differences between client-side and server-side tracking (not tagging!), explores how server-side tracking can improve data quality, and demonstrates how integrating both approaches can elevate your behavioural data collection strategy.