talk-data.com talk-data.com

Topic

Data Quality

data_management data_cleansing data_validation

16

tagged

Activity Trend

82 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Big Data LDN 2025 ×

Arch Capital Group, a $34 billion S&P 500 specialty insurance leader managing $21.5 billion in gross premiums across 60+ global offices, faced a critical challenge: ensuring data quality and consistency across their complex risk assessment operations. With 25+ predictive models supporting AI-driven underwriting for specialty lines—the industry's most complex and unusual risks—incomplete or inaccurate data inputs threatened the accuracy of critical business decisions spanning property & casualty, reinsurance, and mortgage insurance operations. 

In this session, Sam from Arch Capital shares how the organization partnered with DQLabs to transform their data trust framework, implementing automated quality checks across their global data ecosystem. Learn how this transformation enabled Arch to maintain their disciplined underwriting approach while scaling operations, improve regulatory compliance across multiple jurisdictions, and enhance their ability to respond rapidly to emerging risks while supporting the data accuracy essential for their leadership position in specialty insurance markets.

Sound AI outcomes start with trusted, high-quality data and delivering it efficiently is now a core part of every data and AI strategy. In this session, we’ll discuss how AI-supportive capabilities such as autonomous data catalogs, unstructured metadata ingestion and automated data trust scoring are transforming how organizations deliver AI-ready data products at scale with less hands-on staff involvement.

You’ll see how GenAI and agentic AI can accelerate reliable data delivery at every stage, from identifying and fixing data issues to building semantic business layers that give your AI models the context-rich inputs needed for success. We’ll also explore how agentic AI enables self-updating catalogs, proactive data quality monitoring, and automated remediation to free your teams to focus on innovation instead of maintenance.

If you’re shaping your organization’s data and AI strategy as a CDO, CDAIO, CIO, or data leader, this is your blueprint to operationalizing trusted, governed, and AI-ready data for every initiative, faster and smarter.

As organisations scale their data ecosystems, ensuring consistency, compliance, and usability across multiple data products becomes a critical challenge. This session explores a practical approach to implementing a Data Governance framework that balances control with agility.

Key takeaways:

- We will discuss key principles, common pitfalls, and best practices for aligning governance with business objectives while fostering innovation.

- Attendees will gain insights into designing governance policies, automating compliance, and driving adoption across decentralised data teams.

- Real-world examples will illustrate how to create a scalable, federated model that enhances data quality, security, and interoperability across diverse data products.

Large Language Models (LLMs) are transformative, but static knowledge and hallucinations limit their direct enterprise use. Retrieval-Augmented Generation (RAG) is the standard solution, yet moving from prototype to production is fraught with challenges in data quality, scalability, and evaluation.

This talk argues the future of intelligent retrieval lies not in better models, but in a unified, data-first platform. We'll demonstrate how the Databricks Data Intelligence Platform, built on a Lakehouse architecture with integrated tools like Mosaic AI Vector Search, provides the foundation for production-grade RAG.

Looking ahead, we'll explore the evolution beyond standard RAG to advanced architectures like GraphRAG, which enable deeper reasoning within Compound AI Systems. Finally, we'll show how the end-to-end Mosaic AI Agent Framework provides the tools to build, govern, and evaluate the intelligent agents of the future, capable of reasoning across the entire enterprise.

Three out of four companies are betting big on AI – but most are digging on shifting ground. In this $100 billion gold rush, none of these investments will pay off without data quality and strong governance – and that remains a challenge for many organizations. Not every enterprise has a solid data governance practice and maturity models vary widely. As a result, investments in innovation initiatives are at risk of failure. What are the most important data management issues to prioritize? See how your organization measures up and get ahead of the curve with Actian.

Discover how to build a powerful AI Lakehouse and unified data fabric natively on Google Cloud. Leverage BigQuery's serverless scale and robust analytics capabilities as the core, seamlessly integrating open data formats with Apache Iceberg and efficient processing using managed Spark environments like Dataproc. Explore the essential components of this modern data environment, including data architecture best practices, robust integration strategies, high data quality assurance, and efficient metadata management with Google Cloud Data Catalog. Learn how Google Cloud's comprehensive ecosystem accelerates advanced analytics, preparing your data for sophisticated machine learning initiatives and enabling direct connection to services like Vertex AI. 

As AI adoption accelerates across industries, many organisations are realising that building a model is only the beginning. Real-world deployment of AI demands robust infrastructure, clean and connected data, and secure, scalable MLOps pipelines. In this panel, experts from across the AI ecosystem share lessons from the frontlines of operationalising AI at scale.

We’ll dig into the tough questions:

• What are the biggest blockers to AI adoption in large enterprises — and how can we overcome them?

• Why does bad data still derail even the most advanced models, and how can we fix the data quality gap?

• Where does synthetic data fit into real-world AI pipelines — and how do we define “real” data?

• Is Agentic AI the next evolution, or just noise — and how should MLOps prepare?

• What does a modern, secure AI stack look like when using external partners and APIs?

Expect sharp perspectives on data integration, model lifecycle management, and the cyber-physical infrastructure needed to make AI more than just a POC.

As the pioneers of the low-code market since 2001, enterprise software delivery solution OutSystems has evolved rapidly alongside the changing landscape of data. With a global presence and a vast community of over 750,000 members, OutSystems continues to leverage innovative tools, including data observability and generative AI, to help their customers succeed.

In this session, Pedro Sá Martins, Head of Data Engineering, will share the evolution of OutSystems’ data landscape, including how OutSystems has partnered with Snowflake, Fivetran and Monte Carlo to address their modern data challenges. He’ll share best practices for implementing scalable data quality programs to drive innovative technologies, as well as what’s on the data horizon for the OutSystems team.

Data governance often begins with Data Defense — centralized stewardship focused on compliance and regulatory needs, built on passive metadata, manual documentation, and heavy SME reliance. While effective for audits, this top-down approach offers limited business value. 

Data Governance has moved to a Data Offense model to drive Data Monetization of Critical Data Assets in focusing on analytics and data science outcomes for improved decision-making, customer and associate experiences. This involves the integration of data quality and observability with a shift-left based on tangible impact to business outcomes, improved governance maturity, and accelerated resolution of business-impacting issues.

The next iteration is to move to the next phase of Data Stewardship in advancing to AI-Augmented and Autonomous Stewardship — embedding SME knowledge into automated workflows, managing critical assets autonomously, and delivering actionable context through proactive, shift-left observability, producer–consumer contracts, and SLAs that are built into data product development.

Ten years ago, I began advocating for **DataOps**, a framework designed to improve collaboration, efficiency, and agility in data management. The industry was still grappling with fragmented workflows, slow delivery cycles, and a disconnect between data teams and business needs. Fast forward to today, and the landscape has transformed, but have we truly embraced the future of leveraging data at scale? This session will reflect on the evolution of DataOps, examining what’s changed, what challenges persist, and where we're headed next.

**Key Takeaways:**

✅ The biggest wins and ongoing struggles in implementing DataOps over the last decade. 

✅ Practical strategies for improving automation, governance, and data quality in modern workflows. 

✅ How emerging trends like AI-driven automation and real-time analytics are reshaping the way we approach data management. 

✅ Actionable insights on how data teams can stay agile and align better with business objectives. 

**Why Attend?**

If you're a data professional, architect, or leader striving for operational excellence, this talk will equip you with the knowledge to future-proof your data strategies.

Penguin Random House, the world’s largest trade book publisher, relies on data to power every part of its global business, from supply chain operations to editorial workflows and royalty reconciliation. As the complexity of PRH’s dbt pipelines grew, manual checks and brittle tests could no longer keep pace. The Data Governance team knew they needed a smarter, scalable approach to ensure trusted data.

In this session, Kerry Philips, Head of Data Governance at Penguin Random House, will reveal how the team transformed data quality using Sifflet’s observability platform. Learn how PRH integrated column-level lineage, business-rule-aware logic, and real-time alerts into a single workspace, turning fragmented testing into a cohesive strategy for trust, transparency, and agility.

Attendees will gain actionable insights on:

- Rapidly deploying observability without disrupting existing dbt workflows

- Encoding business logic into automated data tests

- Reducing incident resolution times and freeing engineers to innovate

- Empowering analysts to act on data with confidence

If you’ve ever wondered how a company managing millions of ISBNs ensures every dashboard tells the truth, this session offers a behind-the-scenes look at how data observability became PRH’s newest bestseller.

AI-powered development tools are accelerating development speed across the board and analytics event implementation is no exception to this, but without appropriate usage they’re very capable of creating organizational chaos. Same company, same prompt, completely different schemas—data teams can’t analyze what should be identical events across platforms.

The infrastructure assumptions that worked when developers shipped tracking changes in sprint cycles or quarters are breaking when they ship them multiple times per day. Schema inconsistency, cost surprises from experimental traffic, and trust erosion in AI-generated code are becoming the new normal.

Josh will demonstrate how Snowplow’s MCP (Model Context Protocol) server and data-structure toolchains enable teams to harness AI development speed while maintaining data quality and architectural consistency. Using Snowplow’s production approach of AI-powered design paired with deterministic implementation, teams get rapid iteration without the hallucination bugs that plague direct AI code generation.

Key Takeaways:

• How AI development acceleration is fragmenting analytics schemas within organizations

• Architectural patterns that separate AI creativity from production reliability

• Real-world implementation using MCP, Data Products, and deterministic code generation

In an era where data complexity and scale challenge every organization, manual intervention can no longer keep pace. Prizm by DQLabs redefines the paradigm—offering a no-touch, agentic data platform that seamlessly integrates Data Quality, Observability, and Semantic Intelligence into one self-learning, self-optimizing ecosystem.

Unlike legacy systems Prizm is AI native, it is Agentic by Design, built from the ground up around a network of intelligent, role-driven agents that observe, recommend, act, and learn in concert to deliver continuous, autonomous data trust.

Join us at Big Data London to Discover how Prizm’s agent-driven anomaly detection, data quality enforcement, and deep semantic analysis set a new industry standard—shifting data and AI trust from an operational burden to a competitive advantage that powers actionable, insight-driven outcomes.

Large Language Models (LLMs) are transformative, but static knowledge and hallucinations limit their direct enterprise use. Retrieval-Augmented Generation (RAG) is the standard solution, yet moving from prototype to production is fraught with challenges in data quality, scalability, and evaluation.

This talk argues the future of intelligent retrieval lies not in better models, but in a unified, data-first platform. We'll demonstrate how the Databricks Data Intelligence Platform, built on a Lakehouse architecture with integrated tools like Mosaic AI Vector Search, provides the foundation for production-grade RAG.

Looking ahead, we'll explore the evolution beyond standard RAG to advanced architectures like GraphRAG, which enable deeper reasoning within Compound AI Systems. Finally, we'll show how the end-to-end Mosaic AI Agent Framework provides the tools to build, govern, and evaluate the intelligent agents of the future, capable of reasoning across the entire enterprise.