talk-data.com talk-data.com

Topic

Data Quality

data_management data_cleansing data_validation

537

tagged

Activity Trend

82 peak/qtr
2020-Q1 2026-Q1

Activities

537 activities · Newest first

Large Language Models (LLMs) are transformative, but static knowledge and hallucinations limit their direct enterprise use. Retrieval-Augmented Generation (RAG) is the standard solution, yet moving from prototype to production is fraught with challenges in data quality, scalability, and evaluation.

This talk argues the future of intelligent retrieval lies not in better models, but in a unified, data-first platform. We'll demonstrate how the Databricks Data Intelligence Platform, built on a Lakehouse architecture with integrated tools like Mosaic AI Vector Search, provides the foundation for production-grade RAG.

Looking ahead, we'll explore the evolution beyond standard RAG to advanced architectures like GraphRAG, which enable deeper reasoning within Compound AI Systems. Finally, we'll show how the end-to-end Mosaic AI Agent Framework provides the tools to build, govern, and evaluate the intelligent agents of the future, capable of reasoning across the entire enterprise.

Three out of four companies are betting big on AI – but most are digging on shifting ground. In this $100 billion gold rush, none of these investments will pay off without data quality and strong governance – and that remains a challenge for many organizations. Not every enterprise has a solid data governance practice and maturity models vary widely. As a result, investments in innovation initiatives are at risk of failure. What are the most important data management issues to prioritize? See how your organization measures up and get ahead of the curve with Actian.

Discover how to build a powerful AI Lakehouse and unified data fabric natively on Google Cloud. Leverage BigQuery's serverless scale and robust analytics capabilities as the core, seamlessly integrating open data formats with Apache Iceberg and efficient processing using managed Spark environments like Dataproc. Explore the essential components of this modern data environment, including data architecture best practices, robust integration strategies, high data quality assurance, and efficient metadata management with Google Cloud Data Catalog. Learn how Google Cloud's comprehensive ecosystem accelerates advanced analytics, preparing your data for sophisticated machine learning initiatives and enabling direct connection to services like Vertex AI. 

As AI adoption accelerates across industries, many organisations are realising that building a model is only the beginning. Real-world deployment of AI demands robust infrastructure, clean and connected data, and secure, scalable MLOps pipelines. In this panel, experts from across the AI ecosystem share lessons from the frontlines of operationalising AI at scale.

We’ll dig into the tough questions:

• What are the biggest blockers to AI adoption in large enterprises — and how can we overcome them?

• Why does bad data still derail even the most advanced models, and how can we fix the data quality gap?

• Where does synthetic data fit into real-world AI pipelines — and how do we define “real” data?

• Is Agentic AI the next evolution, or just noise — and how should MLOps prepare?

• What does a modern, secure AI stack look like when using external partners and APIs?

Expect sharp perspectives on data integration, model lifecycle management, and the cyber-physical infrastructure needed to make AI more than just a POC.

As the pioneers of the low-code market since 2001, enterprise software delivery solution OutSystems has evolved rapidly alongside the changing landscape of data. With a global presence and a vast community of over 750,000 members, OutSystems continues to leverage innovative tools, including data observability and generative AI, to help their customers succeed.

In this session, Pedro Sá Martins, Head of Data Engineering, will share the evolution of OutSystems’ data landscape, including how OutSystems has partnered with Snowflake, Fivetran and Monte Carlo to address their modern data challenges. He’ll share best practices for implementing scalable data quality programs to drive innovative technologies, as well as what’s on the data horizon for the OutSystems team.

Data governance often begins with Data Defense — centralized stewardship focused on compliance and regulatory needs, built on passive metadata, manual documentation, and heavy SME reliance. While effective for audits, this top-down approach offers limited business value. 

Data Governance has moved to a Data Offense model to drive Data Monetization of Critical Data Assets in focusing on analytics and data science outcomes for improved decision-making, customer and associate experiences. This involves the integration of data quality and observability with a shift-left based on tangible impact to business outcomes, improved governance maturity, and accelerated resolution of business-impacting issues.

The next iteration is to move to the next phase of Data Stewardship in advancing to AI-Augmented and Autonomous Stewardship — embedding SME knowledge into automated workflows, managing critical assets autonomously, and delivering actionable context through proactive, shift-left observability, producer–consumer contracts, and SLAs that are built into data product development.

Ten years ago, I began advocating for **DataOps**, a framework designed to improve collaboration, efficiency, and agility in data management. The industry was still grappling with fragmented workflows, slow delivery cycles, and a disconnect between data teams and business needs. Fast forward to today, and the landscape has transformed, but have we truly embraced the future of leveraging data at scale? This session will reflect on the evolution of DataOps, examining what’s changed, what challenges persist, and where we're headed next.

**Key Takeaways:**

✅ The biggest wins and ongoing struggles in implementing DataOps over the last decade. 

✅ Practical strategies for improving automation, governance, and data quality in modern workflows. 

✅ How emerging trends like AI-driven automation and real-time analytics are reshaping the way we approach data management. 

✅ Actionable insights on how data teams can stay agile and align better with business objectives. 

**Why Attend?**

If you're a data professional, architect, or leader striving for operational excellence, this talk will equip you with the knowledge to future-proof your data strategies.

Penguin Random House, the world’s largest trade book publisher, relies on data to power every part of its global business, from supply chain operations to editorial workflows and royalty reconciliation. As the complexity of PRH’s dbt pipelines grew, manual checks and brittle tests could no longer keep pace. The Data Governance team knew they needed a smarter, scalable approach to ensure trusted data.

In this session, Kerry Philips, Head of Data Governance at Penguin Random House, will reveal how the team transformed data quality using Sifflet’s observability platform. Learn how PRH integrated column-level lineage, business-rule-aware logic, and real-time alerts into a single workspace, turning fragmented testing into a cohesive strategy for trust, transparency, and agility.

Attendees will gain actionable insights on:

- Rapidly deploying observability without disrupting existing dbt workflows

- Encoding business logic into automated data tests

- Reducing incident resolution times and freeing engineers to innovate

- Empowering analysts to act on data with confidence

If you’ve ever wondered how a company managing millions of ISBNs ensures every dashboard tells the truth, this session offers a behind-the-scenes look at how data observability became PRH’s newest bestseller.

AI-powered development tools are accelerating development speed across the board and analytics event implementation is no exception to this, but without appropriate usage they’re very capable of creating organizational chaos. Same company, same prompt, completely different schemas—data teams can’t analyze what should be identical events across platforms.

The infrastructure assumptions that worked when developers shipped tracking changes in sprint cycles or quarters are breaking when they ship them multiple times per day. Schema inconsistency, cost surprises from experimental traffic, and trust erosion in AI-generated code are becoming the new normal.

Josh will demonstrate how Snowplow’s MCP (Model Context Protocol) server and data-structure toolchains enable teams to harness AI development speed while maintaining data quality and architectural consistency. Using Snowplow’s production approach of AI-powered design paired with deterministic implementation, teams get rapid iteration without the hallucination bugs that plague direct AI code generation.

Key Takeaways:

• How AI development acceleration is fragmenting analytics schemas within organizations

• Architectural patterns that separate AI creativity from production reliability

• Real-world implementation using MCP, Data Products, and deterministic code generation

In an era where data complexity and scale challenge every organization, manual intervention can no longer keep pace. Prizm by DQLabs redefines the paradigm—offering a no-touch, agentic data platform that seamlessly integrates Data Quality, Observability, and Semantic Intelligence into one self-learning, self-optimizing ecosystem.

Unlike legacy systems Prizm is AI native, it is Agentic by Design, built from the ground up around a network of intelligent, role-driven agents that observe, recommend, act, and learn in concert to deliver continuous, autonomous data trust.

Join us at Big Data London to Discover how Prizm’s agent-driven anomaly detection, data quality enforcement, and deep semantic analysis set a new industry standard—shifting data and AI trust from an operational burden to a competitive advantage that powers actionable, insight-driven outcomes.

Large Language Models (LLMs) are transformative, but static knowledge and hallucinations limit their direct enterprise use. Retrieval-Augmented Generation (RAG) is the standard solution, yet moving from prototype to production is fraught with challenges in data quality, scalability, and evaluation.

This talk argues the future of intelligent retrieval lies not in better models, but in a unified, data-first platform. We'll demonstrate how the Databricks Data Intelligence Platform, built on a Lakehouse architecture with integrated tools like Mosaic AI Vector Search, provides the foundation for production-grade RAG.

Looking ahead, we'll explore the evolution beyond standard RAG to advanced architectures like GraphRAG, which enable deeper reasoning within Compound AI Systems. Finally, we'll show how the end-to-end Mosaic AI Agent Framework provides the tools to build, govern, and evaluate the intelligent agents of the future, capable of reasoning across the entire enterprise.

The relationship between AI assistants and data professionals is evolving rapidly, creating both opportunities and challenges. These tools can supercharge workflows by generating SQL, assisting with exploratory analysis, and connecting directly to databases—but they're far from perfect. How do you maintain the right balance between leveraging AI capabilities and preserving your fundamental skills? As data teams face mounting pressure to deliver AI-ready data and demonstrate business value, what strategies can ensure your work remains trustworthy? With issues ranging from biased algorithms to poor data quality potentially leading to serious risks, how can organizations implement responsible AI practices while still capitalizing on the positive applications of this technology? Christina Stathopoulos is an international data specialist who regularly serves as an executive advisor, consultant, educator, and public speaker. With expertise in analytics, data strategy, and data visualization, she has built a distinguished career in technology, including roles at Fortune 500 companies. Most recently, she spent over five years at Google and Waze, leading data strategy and driving cross-team projects. Her professional journey has spanned both the United States and Spain, where she has combined her passion for data, technology, and education to make data more accessible and impactful for all. Christina also plays a unique role as a “data translator,” helping to bridge the gap between business and technical teams to unlock the full value of data assets. She is the founder of Dare to Data, a consultancy created to formalize and structure her work with some of the world’s leading companies, supporting and empowering them in their data and AI journeys. Current and past clients include IBM, PepsiCo, PUMA, Shell, Whirlpool, Nitto, and Amazon Web Services.

In the episode, Richie and Christina explore the role of AI agents in data analysis, the evolving workflow with AI assistance, the importance of maintaining foundational skills, the integration of AI in data strategy, the significance of trustworthy AI, and much more.

Links Mentioned in the Show: Dare to DataJulius AIConnect with ChristinaCourse - Introduction to SQL with AIRelated Episode: The Data to AI Journey with Gerrit Kazmaier, VP & GM of Data Analytics at Google CloudRewatch RADAR AI 

New to DataCamp? Learn on the go using the DataCamp mobile app Empower your business with world-class data and AI skills with DataCamp for business

EBMT, one of the biggest medical registries in Europe, has rebuilt its core data system from scratch, after 20 years of service, to keep up with growing data needs, modern technologies, and the evolving needs of researchers in blood and marrow transplantation. The new AWS-based system supports data collection and analysis at scale, using cloud infrastructure and business intelligence tools to improve data quality and data usability across EBMT’s network.

Uber Eats processes millions of daily orders across Europe, generating vast amounts of data. But only quality data drives real impact. In this session, we’ll explore how strong data foundations enable personalized experiences, boost retention, and fuel growth. Learn how to cut through the noise, avoid common pitfalls, and use high-quality data to make smarter, faster, and more customer-centric decisions.

After four years of building Vanderlande's data platform, we've learned that theoretical purity often collides with practical reality. This presentation shares our journey from rigid architectural principles to pragmatic solutions that truly scale. Discover how we're simplifying layer structures, standardizing with YAML, rethinking data quality implementation, and finding the right balance between data mesh theory and practical data products that deliver value.