Data Quality

Building Resilient Data Operations: How Arch Capital Transformed Risk Analytics

2025-09-25 · Big Data LDN 2025

Face To Face

by Sam Felt (Arch Insurance International) , Deepthi Dommaraju (DQLabs)

AI/ML Analytics

Arch Capital Group, a $34 billion S&P 500 specialty insurance leader managing $21.5 billion in gross premiums across 60+ global offices, faced a critical challenge: ensuring data quality and consistency across their complex risk assessment operations. With 25+ predictive models supporting AI-driven underwriting for specialty lines—the industry's most complex and unusual risks—incomplete or inaccurate data inputs threatened the accuracy of critical business decisions spanning property & casualty, reinsurance, and mortgage insurance operations.

In this session, Sam from Arch Capital shares how the organization partnered with DQLabs to transform their data trust framework, implementing automated quality checks across their global data ecosystem. Learn how this transformation enabled Arch to maintain their disciplined underwriting approach while scaling operations, improve regulatory compliance across multiple jurisdictions, and enhance their ability to respond rapidly to emerging risks while supporting the data accuracy essential for their leadership position in specialty insurance markets.

Powering AI Success with Autonomous Data Catalogs and Agents

2025-09-25 · Big Data LDN 2025

Face To Face

by Thomas Gustinis (4th-IR) , Michael O’Donnell (Quest Software)

AI/ML GenAI

Sound AI outcomes start with trusted, high-quality data and delivering it efficiently is now a core part of every data and AI strategy. In this session, we’ll discuss how AI-supportive capabilities such as autonomous data catalogs, unstructured metadata ingestion and automated data trust scoring are transforming how organizations deliver AI-ready data products at scale with less hands-on staff involvement.

You’ll see how GenAI and agentic AI can accelerate reliable data delivery at every stage, from identifying and fixing data issues to building semantic business layers that give your AI models the context-rich inputs needed for success. We’ll also explore how agentic AI enables self-updating catalogs, proactive data quality monitoring, and automated remediation to free your teams to focus on innovation instead of maintenance.

If you’re shaping your organization’s data and AI strategy as a CDO, CDAIO, CIO, or data leader, this is your blueprint to operationalizing trusted, governed, and AI-ready data for every initiative, faster and smarter.

Implementing a Data Governance Framework using Data Products

2025-09-25 · Big Data LDN 2025

Face To Face

by Damien Julliard (IAG Loyalty)

Data Governance Cyber Security

As organisations scale their data ecosystems, ensuring consistency, compliance, and usability across multiple data products becomes a critical challenge. This session explores a practical approach to implementing a Data Governance framework that balances control with agility.

Key takeaways:

- We will discuss key principles, common pitfalls, and best practices for aligning governance with business objectives while fostering innovation.

- Attendees will gain insights into designing governance policies, automating compliance, and driving adoption across decentralised data teams.

- Real-world examples will illustrate how to create a scalable, federated model that enhances data quality, security, and interoperability across diverse data products.

The Future of Intelligent Retrieval

2025-09-25 · Big Data LDN 2025

Face To Face

by Adam Nowaczyk (Acaisoft)

AI/ML Data Lakehouse Databricks LLM RAG

Large Language Models (LLMs) are transformative, but static knowledge and hallucinations limit their direct enterprise use. Retrieval-Augmented Generation (RAG) is the standard solution, yet moving from prototype to production is fraught with challenges in data quality, scalability, and evaluation.

This talk argues the future of intelligent retrieval lies not in better models, but in a unified, data-first platform. We'll demonstrate how the Databricks Data Intelligence Platform, built on a Lakehouse architecture with integrated tools like Mosaic AI Vector Search, provides the foundation for production-grade RAG.

Looking ahead, we'll explore the evolution beyond standard RAG to advanced architectures like GraphRAG, which enable deeper reasoning within Compound AI Systems. Finally, we'll show how the end-to-end Mosaic AI Agent Framework provides the tools to build, govern, and evaluate the intelligent agents of the future, capable of reasoning across the entire enterprise.

The Hidden Reason AI Projects Fail: A Data Governance Wake-Up Call

2025-09-25 · Big Data LDN 2025

Face To Face

by Emma McGrattan (Actian, a division of HCLSoftware)

AI/ML Data Governance Data Management

Three out of four companies are betting big on AI – but most are digging on shifting ground. In this $100 billion gold rush, none of these investments will pay off without data quality and strong governance – and that remains a challenge for many organizations. Not every enterprise has a solid data governance practice and maturity models vary widely. As a result, investments in innovation initiatives are at risk of failure. What are the most important data management issues to prioritize? See how your organization measures up and get ahead of the curve with Actian.

Building an AI-ready Open Lakehouse on Google Cloud

2025-09-25 · Big Data LDN 2025

Face To Face

by Gareth Williams (Digital Health and Care Wales) , Sadeeq Akintola (Google Cloud)

AI/ML Analytics BigQuery Cloud Computing Data Lakehouse Dataproc GCP Iceberg Fabric Spark

Discover how to build a powerful AI Lakehouse and unified data fabric natively on Google Cloud. Leverage BigQuery's serverless scale and robust analytics capabilities as the core, seamlessly integrating open data formats with Apache Iceberg and efficient processing using managed Spark environments like Dataproc. Explore the essential components of this modern data environment, including data architecture best practices, robust integration strategies, high data quality assurance, and efficient metadata management with Google Cloud Data Catalog. Learn how Google Cloud's comprehensive ecosystem accelerates advanced analytics, preparing your data for sophisticated machine learning initiatives and enabling direct connection to services like Vertex AI.

Building at Scale: Real-World MLOps, Data Quality & Enterprise AI Integration

2025-09-24 · Big Data LDN 2025

Face To Face

by Andrea Isoni (AI Technologies) , Julia Pattie (Kubrick) , Ravit Jain (The Ravit Show) , Patrik Liu Tran (Validio) , Justin Langford , Ben Johnson (Uptitude)

AI/ML API MLOps

As AI adoption accelerates across industries, many organisations are realising that building a model is only the beginning. Real-world deployment of AI demands robust infrastructure, clean and connected data, and secure, scalable MLOps pipelines. In this panel, experts from across the AI ecosystem share lessons from the frontlines of operationalising AI at scale.

We’ll dig into the tough questions:

• What are the biggest blockers to AI adoption in large enterprises — and how can we overcome them?

• Why does bad data still derail even the most advanced models, and how can we fix the data quality gap?

• Where does synthetic data fit into real-world AI pipelines — and how do we define “real” data?

• Is Agentic AI the next evolution, or just noise — and how should MLOps prepare?

• What does a modern, secure AI stack look like when using external partners and APIs?

Expect sharp perspectives on data integration, model lifecycle management, and the cyber-physical infrastructure needed to make AI more than just a POC.

Driving Impact Through Data: The Evolution of Data Quality at OutSystems

2025-09-24 · Big Data LDN 2025

Face To Face

by Pedro Sá Martins (Outsystems)

AI/ML Data Engineering Fivetran GenAI Monte Carlo Snowflake

As the pioneers of the low-code market since 2001, enterprise software delivery solution OutSystems has evolved rapidly alongside the changing landscape of data. With a global presence and a vast community of over 750,000 members, OutSystems continues to leverage innovative tools, including data observability and generative AI, to help their customers succeed.

In this session, Pedro Sá Martins, Head of Data Engineering, will share the evolution of OutSystems’ data landscape, including how OutSystems has partnered with Snowflake, Fivetran and Monte Carlo to address their modern data challenges. He’ll share best practices for implementing scalable data quality programs to drive innovative technologies, as well as what’s on the data horizon for the OutSystems team.

How BP and Kingfisher Automate Data Quality with Anomalo

2025-09-24 · Big Data LDN 2025

Face To Face

by Jaspal Bains (Kingfisher) , Toon Weyens (Anomalo) , Nigel Davidson (BP)

Data Stewardship: Past, Present & Future in the age of AI

2025-09-24 · Big Data LDN 2025

Face To Face

by Peter Kapur (CarMax)

AI/ML Analytics Data Governance Data Science

Data governance often begins with Data Defense — centralized stewardship focused on compliance and regulatory needs, built on passive metadata, manual documentation, and heavy SME reliance. While effective for audits, this top-down approach offers limited business value.

Data Governance has moved to a Data Offense model to drive Data Monetization of Critical Data Assets in focusing on analytics and data science outcomes for improved decision-making, customer and associate experiences. This involves the integration of data quality and observability with a shift-left based on tangible impact to business outcomes, improved governance maturity, and accelerated resolution of business-impacting issues.

The next iteration is to move to the next phase of Data Stewardship in advancing to AI-Augmented and Autonomous Stewardship — embedding SME knowledge into automated workflows, managing critical assets autonomously, and delivering actionable context through proactive, shift-left observability, producer–consumer contracts, and SLAs that are built into data product development.

DataOps: 10 years later

2025-09-24 · Big Data LDN 2025

Face To Face

by Steph Locke (Making a difference)

Agile/Scrum AI/ML Analytics Data Management DataOps

Ten years ago, I began advocating for **DataOps**, a framework designed to improve collaboration, efficiency, and agility in data management. The industry was still grappling with fragmented workflows, slow delivery cycles, and a disconnect between data teams and business needs. Fast forward to today, and the landscape has transformed, but have we truly embraced the future of leveraging data at scale? This session will reflect on the evolution of DataOps, examining what’s changed, what challenges persist, and where we're headed next.

**Key Takeaways:**

✅ The biggest wins and ongoing struggles in implementing DataOps over the last decade.

✅ Practical strategies for improving automation, governance, and data quality in modern workflows.

✅ How emerging trends like AI-driven automation and real-time analytics are reshaping the way we approach data management.

✅ Actionable insights on how data teams can stay agile and align better with business objectives.

**Why Attend?**

If you're a data professional, architect, or leader striving for operational excellence, this talk will equip you with the knowledge to future-proof your data strategies.

Bigeye: The Past, Present, and Future of AI in Data Observability

2025-09-24 · Big Data LDN 2025

Face To Face

by Kyle Kirwan (Bigeye)

AI/ML BigEye

Data observability helps to detect and diagnose unknown unknowns in data quality and pipeline performance—but it doesn't fix them. Learn how Bigeye is leveraging AI to enable data teams to resolve data quality and pipeline issues, and even prevent them occurring in the first place.

Transforming Data Chaos into Clarity: Lessons from Penguin Random House UK

2025-09-24 · Big Data LDN 2025

Face To Face

by Kerry Philips (Penguin Random House UK)

Dashboard Data Governance dbt

Penguin Random House, the world’s largest trade book publisher, relies on data to power every part of its global business, from supply chain operations to editorial workflows and royalty reconciliation. As the complexity of PRH’s dbt pipelines grew, manual checks and brittle tests could no longer keep pace. The Data Governance team knew they needed a smarter, scalable approach to ensure trusted data.

In this session, Kerry Philips, Head of Data Governance at Penguin Random House, will reveal how the team transformed data quality using Sifflet’s observability platform. Learn how PRH integrated column-level lineage, business-rule-aware logic, and real-time alerts into a single workspace, turning fragmented testing into a cohesive strategy for trust, transparency, and agility.

Attendees will gain actionable insights on:

- Rapidly deploying observability without disrupting existing dbt workflows

- Encoding business logic into automated data tests

- Reducing incident resolution times and freeing engineers to innovate

- Empowering analysts to act on data with confidence

If you’ve ever wondered how a company managing millions of ISBNs ensures every dashboard tells the truth, this session offers a behind-the-scenes look at how data observability became PRH’s newest bestseller.

MCP at the Helm of Autonomous Event Architecture

2025-09-24 · Big Data LDN 2025

Face To Face

by Josh Beemster (Snowplow)

AI/ML Analytics Snowplow

AI-powered development tools are accelerating development speed across the board and analytics event implementation is no exception to this, but without appropriate usage they’re very capable of creating organizational chaos. Same company, same prompt, completely different schemas—data teams can’t analyze what should be identical events across platforms.

The infrastructure assumptions that worked when developers shipped tracking changes in sprint cycles or quarters are breaking when they ship them multiple times per day. Schema inconsistency, cost surprises from experimental traffic, and trust erosion in AI-generated code are becoming the new normal.

Josh will demonstrate how Snowplow’s MCP (Model Context Protocol) server and data-structure toolchains enable teams to harness AI development speed while maintaining data quality and architectural consistency. Using Snowplow’s production approach of AI-powered design paired with deterministic implementation, teams get rapid iteration without the hallucination bugs that plague direct AI code generation.

Key Takeaways:

• How AI development acceleration is fragmenting analytics schemas within organizations

• Architectural patterns that separate AI creativity from production reliability

• Real-world implementation using MCP, Data Products, and deterministic code generation

Prizm Unleashed: The Future of Autonomous Data Management

2025-09-24 · Big Data LDN 2025

Face To Face

by Raj Joseph (DQLabs, Inc.)

AI/ML Big Data Data Management

In an era where data complexity and scale challenge every organization, manual intervention can no longer keep pace. Prizm by DQLabs redefines the paradigm—offering a no-touch, agentic data platform that seamlessly integrates Data Quality, Observability, and Semantic Intelligence into one self-learning, self-optimizing ecosystem.

Unlike legacy systems Prizm is AI native, it is Agentic by Design, built from the ground up around a network of intelligent, role-driven agents that observe, recommend, act, and learn in concert to deliver continuous, autonomous data trust.

Join us at Big Data London to Discover how Prizm’s agent-driven anomaly detection, data quality enforcement, and deep semantic analysis set a new industry standard—shifting data and AI trust from an operational burden to a competitive advantage that powers actionable, insight-driven outcomes.

The Future of Intelligent Retrieval

2025-09-24 · Big Data LDN 2025

Face To Face

by Adam Nowaczyk (Acaisoft)

AI/ML Data Lakehouse Databricks LLM RAG

Large Language Models (LLMs) are transformative, but static knowledge and hallucinations limit their direct enterprise use. Retrieval-Augmented Generation (RAG) is the standard solution, yet moving from prototype to production is fraught with challenges in data quality, scalability, and evaluation.

This talk argues the future of intelligent retrieval lies not in better models, but in a unified, data-first platform. We'll demonstrate how the Databricks Data Intelligence Platform, built on a Lakehouse architecture with integrated tools like Mosaic AI Vector Search, provides the foundation for production-grade RAG.

Looking ahead, we'll explore the evolution beyond standard RAG to advanced architectures like GraphRAG, which enable deeper reasoning within Compound AI Systems. Finally, we'll show how the end-to-end Mosaic AI Agent Framework provides the tools to build, govern, and evaluate the intelligent agents of the future, capable of reasoning across the entire enterprise.

talk-data.com

Activity Trend

Top Events

Top Speakers

Building Resilient Data Operations: How Arch Capital Transformed Risk Analytics

Powering AI Success with Autonomous Data Catalogs and Agents

Implementing a Data Governance Framework using Data Products

The Future of Intelligent Retrieval

The Hidden Reason AI Projects Fail: A Data Governance Wake-Up Call

Building an AI-ready Open Lakehouse on Google Cloud

Building at Scale: Real-World MLOps, Data Quality & Enterprise AI Integration

Driving Impact Through Data: The Evolution of Data Quality at OutSystems

How BP and Kingfisher Automate Data Quality with Anomalo

Data Stewardship: Past, Present & Future in the age of AI

DataOps: 10 years later

Bigeye: The Past, Present, and Future of AI in Data Observability

Transforming Data Chaos into Clarity: Lessons from Penguin Random House UK

MCP at the Helm of Autonomous Event Architecture

Prizm Unleashed: The Future of Autonomous Data Management

The Future of Intelligent Retrieval