talk-data.com talk-data.com

Topic

Data Quality

data_management data_cleansing data_validation

537

tagged

Activity Trend

82 peak/qtr
2020-Q1 2026-Q1

Activities

537 activities · Newest first

Coalesce 2024: Breaking the mold: A smarter approach to data testing

Current data testing practices—meticulously testing individual models and methods—are not only outdated but also costly and inefficient. In this talk, Aiven challenges this traditional approach, which they argue accumulates unnecessary technical debt and inflates warehousing costs without improving data quality.

Speakers: Anton Heikinheimo Senior Data Engineer Aiven

Emiel Verkade Senior Analytics Engineer Aiven

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: Optimize your dbt pipelines in Snowflake instantly

Are you looking to optimize your data workflows and elevate data quality in Snowflake? The analytics engineering team at Lyst achieved just that with innovative dbt macros…and it’s time to share the wealth!

In this session, Naomi will share two game-changing dbt macros that streamline data pipelines, reduce redundant efforts, and ensure robust data quality. Learn how her team’s approach to querying the dbt DAG identifies outdated models and simplifies pipeline migrations by detecting downstream impacts. Plus, uncover a fresh strategy for enhancing primary key testing on incremental models, resulting in instant efficiency and accuracy improvements.

You will walk away with actionable next steps to adopt these macros and best practices in your organization.

Speaker: Naomi Johnson Director of Data Platform Lyst

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: Don't panic: What to do when your data breaks

This talk isn't about data quality tests. Why? Well, because there’s no shortage of tools and processes for testing your data, monitoring your data, and alerting your team when the pipeline breaks. Instead, this is a talk about what happens after the alarm goes off.

Data incidents can quickly snowball into much more than simply fixing the issue. Juggling comms, diagnosing the issue, ownership of the failure, your DMs lighting up…we’ve all been there and understand just how stressful it can be.

In this session, Matilda covers a number of tool and product agnostic approaches teams can adopt to improve how they manage data incidents. These are practical steps any data team can follow to improve how they’re resolving, communicating, and learning from their incidents, with the goal of providing more resilient data systems.

Speaker: Matilda Hultgren Data Analyst - Product incident.io

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: From Core to Cloud: Unlocking dbt at Warner Brothers Discovery (CNN)

Since the beginning of 2024, the Warner Brothers Discovery team supporting the CNN data platform has been undergoing an extensive migration project from dbt Core to dbt Cloud. Concurrently, the team is also segmenting their project into multi-project frameworks utilizing dbt Mesh. In this talk, Zachary will review how this transition has simplified data pipelines, improved pipeline performance and data quality, and made data collaboration at scale more seamless.

He'll discuss how dbt Cloud features like the Cloud IDE, automated testing, documentation, and code deployment have enabled the team to standardize on a single developer platform while also managing dependencies effectively. He'll share details on how the automation framework they built using Terraform streamlines dbt project deployments with dbt Cloud to a ""push-button"" process. By leveraging an infrastructure as code experience, they can orchestrate the creation of environment variables, dbt Cloud jobs, Airflow connections, and AWS secrets with a unified approach that ensures consistency and reliability across projects.

Speakers: Mamta Gupta Staff Analytics Engineer Warner Brothers Discovery

Zachary Lancaster Manager, Data Engineering Warner Brothers Discovery

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Businesses are constantly racing to stay ahead by adopting the latest data tools and AI technologies. But with so many options and buzzwords, it’s easy to get lost in the excitement without knowing whether these tools truly serve your business. How can you ensure that your data stack is not only modern but sustainable and agile enough to adapt to changing needs? What does it take to build data products that deliver real value to your teams while driving innovation? Adrian Estala is VP, Field Chief Data Officer and the host of Starburst TV. With a background in leading Digital and IT Portfolio Transformations, he understands the value of creating executive frameworks that focus on material business outcomes. Skilled with getting the most out of data-driven investments, Adrian is your trusted adviser to navigating complex data environments and integrating a Data Mesh strategy in your organization. In the episode, Richie and Adrian explore the modern data stack, agility in data, collaboration between business and data teams, data products and differing ways of building them, data discovery and metadata, data quality, career skills for data practitioners and much more. Links Mentioned in the Show: StarburstConnect with AdrianCareer Track: Data Engineer in PythonRelated Episode: How this Accenture CDO is Navigating the AI RevolutionRewatch sessions from RADAR: AI Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms

This book covers modern data engineering functions and important Python libraries, to help you develop state-of-the-art ML pipelines and integration code. The book begins by explaining data analytics and transformation, delving into the Pandas library, its capabilities, and nuances. It then explores emerging libraries such as Polars and CuDF, providing insights into GPU-based computing and cutting-edge data manipulation techniques. The text discusses the importance of data validation in engineering processes, introducing tools such as Great Expectations and Pandera to ensure data quality and reliability. The book delves into API design and development, with a specific focus on leveraging the power of FastAPI. It covers authentication, authorization, and real-world applications, enabling you to construct efficient and secure APIs using FastAPI. Also explored is concurrency in data engineering, examining Dask's capabilities from basic setup to crafting advanced machine learning pipelines. The book includes development and delivery of data engineering pipelines using leading cloud platforms such as AWS, Google Cloud, and Microsoft Azure. The concluding chapters concentrate on real-time and streaming data engineering pipelines, emphasizing Apache Kafka and workflow orchestration in data engineering. Workflow tools such as Airflow and Prefect are introduced to seamlessly manage and automate complex data workflows. What sets this book apart is its blend of theoretical knowledge and practical application, a structured path from basic to advanced concepts, and insights into using state-of-the-art tools. With this book, you gain access to cutting-edge techniques and insights that are reshaping the industry. This book is not just an educational tool. It is a career catalyst, and an investment in your future as a data engineering expert, poised to meet the challenges of today's data-driven world. What You Will Learn Elevate your data wrangling jobs by utilizing the power of both CPU and GPU computing, and learn to process data using Pandas 2.0, Polars, and CuDF at unprecedented speeds Design data validation pipelines, construct efficient data service APIs, develop real-time streaming pipelines and master the art of workflow orchestration to streamline your engineering projects Leverage concurrent programming to develop machine learning pipelines and get hands-on experience in development and deployment of machine learning pipelines across AWS, GCP, and Azure Who This Book Is For Data analysts, data engineers, data scientists, machine learning engineers, and MLOps specialists

Every day, banking institution Capital on Tap is calculating thousands of credit scores, directly impacting how their customers receive credit cards or additional lines of credit. Data quality is paramount – incorrect credit scores can set off a wide range of long-lasting financial implications for their customers, which is why the team turned to data observability with Monte Carlo, to improve their data – and credit score – reliability. 

But, as with any new tool in your tech stack, onboarding new processes for key users is just as important as onboarding the tool itself. 

Join this session with Ben Jones and Soren Rehn, to hear why the Analytics Engineering team at Capital on Tap decided to invest in a data observability tool, how their processes play a critical role in maximizing the tool’s value (including a few missteps and recalibrations along the way), and the strategies employed to garner widespread success and buy-in over time.

Join Experian, Sainsbury’s, The Nottingham, UST and British Business Bank discuss how better data quality and better data governance leads to improved AI. Hear real business examples of how AI is being implemented and the lessons our panellists wished they’d known sooner. Also learn key takeaways on how to have a better Data Governance strategy and why having trust in your data is more important than any new emerging technology.

Enterprises who deploy data observability report fewer and shorter incidents due to data quality issues. However, deploying data observability widely within an enterprise can be daunting, especially for teams who have experienced a heavy lift when rolling out other data governance technologies. This talk will review the top challenges enterprises will face when pursuing a data observability initiative, and a mix of process and technology solutions that can mitigate them to speed up time to value so data governance teams can show business-facing results quickly.

As the hype for AI grows, organizations are still wrestling with the fundamentals of data governance. The ambitions of executives and boardrooms to implement next-gen AI use cases hinges on a solid data foundation including cataloging, ownership, and data quality. Join Collibra’s Chief Data Citizen, Stijn Christiaens and Vodafone’s Sr. Data Governance Manager, Fede Frumento, to learn how Vodafone has used data governance fundamentals to increase the scalability and collaboration of GenAI use cases.

In today's data-driven world, data mastery is crucial for success. Enter Data Observability, a revolutionary approach that tackles complex challenges and unlocks new possibilities in the age of AI. This session explores the transformative power of Data Observability through compelling use cases across various industries, including retail, finance, manufacturing, healthcare, and more.

As a leader in Data Observability, Acceldata will showcase how organizations can:

Detect and resolve data issues in real-time

Ensure data integrity throughout complex transformations

Maintain consistent data quality across diverse systems

See how retail giants optimize supply chains and enhance customer experiences, how financial institutions achieve superior compliance and risk management, and how manufacturers leverage data for efficiency and innovation.

This presentation goes beyond theory, showcasing the immense potential of Data Observability.

Face To Face
by Jennifer Jackson (Actian, a division of HCLSoftware) , Emma McGrattan (Actian, a division of HCLSoftware)

Many companies are under pressure to implement Gen AI ASAP, but not everyone sees the risks clearly. New Actian research shows nearly 80% of respondents think their data quality is up to the task. But real data prep takes more work than most business leaders think. How can you avoid the data prep pitfalls that can tank a Gen AI initiative? How can you move quickly—and confidently—into the Gen AI era? Actian’s SVP of Engineering & Product, Emma McGrattan and CMO Jennifer Jackson will share research and customer perspectives to explain true data readiness and how to optimize your Gen AI journey.

As organizations are exploring and expanding on their AI capabilities, Chief Data Officers are now responsible for governing the data for responsible and trustworthy AI. This session will cover 5 key principles to ensure successful adoption and scaling of AI initiatives that align with their company?s business strategy. From data quality to advocating for ethical AI practices, the Chief Data Officer?s mandate has expanded to compliance of new AI regulations. Peggy Tsai, Chief Data Officer at BigID and adjunct faculty member at Carnegie Mellon University for the Chief Data Officer executive program, will provide insights into the AI governance strategies and outcomes crucial for cultivating an AI-first organization. Drawing on her extensive experience in data governance and AI, this session will be an invaluable guidance for all participants aiming to adopt industry-leading practices.

The data engineer role has expanded far beyond data pipeline management. Data engineers are now tasked with managing scalable infrastructure, optimizing cloud resources, and ensuring real-time data processing, while keeping costs in check - which continues to be quite challenging.

In this session, Revefi will demonstrate Raden, the world’s first AI data engineer. Raden augments data teams with “distinguished engineer level” expertise in data architecture, system performance, optimization, and cost management.

Raden uses GenAI and AI to address these challenges by working with your team as an 👩‍✈️ AutoPilot and/or 👨‍✈️ CoPilot by automating critical functions such as Data Quality, Data Observability, Spend Management, Performance Management, and Usage Management, allowing your data team to tackle complex use cases with ease.

Join us to discover how you can revamp your data engineering practices and dramatically improve the ROI from your data investments 

We'll explore innovative strategies to shift DQ from a technical to a business-centric mindset. This session will guide the audience in transforming DQ into a tool that business owners won't just use, but will rely on every morning to kickstart their day. The focus will be to reinvent how your organization perceives and interacts with Data Quality, making it an integral part of your business narrative.

Finding anomalies and significant deviations in high quality business-critical data in real-time is key to any data-intensive business, especially in the financial sector where trust and reliability are paramount. Join us for a deep-dive on how global real-time payment network Volt leverages Validio’s data quality and observability platform, to catch deviations in key metrics such as traffic and payment volume, payment initiation and conversion.

Data quality issues due to lack of clear data definition ». Example of unclear surface areas definitions.

  • Solution: Set up data governance & organization to support it. Implement a data catalog & the choice of DataGalaxy
  • Results: Data Catalog usage within Cofinimmo. Efficiency gains and risk reduction thanks to clear data definitions and data quality improvement.

The success of AI initiatives hinges on DATA. According to recent research, only 10% of enterprises will achieve the expected ROI from their Generative AI deployments, with data quality issues being the most cited reason for failure. The core message is clear: 'You are as AI-ready as your data.' This session will explore practical approaches to overcoming common data challenges and ensuring your data meets the specific requirements of AI techniques.

Key Takeaways:

• Understanding AI Readiness & how to assess it?

• AI-Ready Data: Two core Foundations

• Build a scalable data infrastructure that accelerates AI deployment and innovation. DQLabs Framework & Practical Approaches to fix your data problems.

Data Observability is the new frontier of modern data management. Leading enterprises rely on Acceldata to ensure data quality, streamline operations, optimize costs, and maintain compliance. Join us to discover how to implement Enterprise Data Observability across on-prem and cloud environments, creating a single source of truth for data leaders, engineers, scientists, and business users. Learn from a seasoned industry expert who has successfully operationalized data governance at petabyte scale and delivers reliable data for AI and analytics initiatives.

Lunar, a leading Nordic digital bank, successfully implemented a data governance framework to enhance data quality and secure C-level buy-in by using SYNQ, a data reliability and observability tool. 

Their framework focuses on data ownership, criticality, and monitoring. Lunar's data team, leveraging tools like SYNQ, ensures high standards against financial crime, personalisation through AI, and reliable reporting. 

They maintain oversight through automated monitoring, use of data products, and a robust ownership model, which enhances data quality and accelerates issue resolution for their reports to executives. 

This approach enables Lunar’s data engineering and data governance teams to work in harmony, and operate efficiently without having to increase headcount.