talk-data.com talk-data.com

Topic

JSON

JavaScript Object Notation (JSON)

data_format lightweight web_development file_format

2

tagged

Activity Trend

9 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: dbt Coalesce 2023 ×
Scaling dbt models for CDC on large databases - Coalesce 2023

Unlike transforming staged data to marts, ingesting data into staging requires robustness to data volume and type changes, schema evolution, and data drift. Especially when performing change data capture (CDC) on large databases (~100 tables to a database), we’ll ideally reinforce our dbt models with automatic:

  • Mapping of dynamic columns and data types between the source and the target stag
  • evolution of stage table schemas at pace with incoming data, including for nested data structures
  • parsing and flattening of any arrays and JSON structs in the data.

Manually performing these tasks for data at scale is a tall order due to the many permutations with which CDC data can deviate. Waiting to implement them in mart transformation models is potentially detrimental to the business, as well as doesn’t reduce the complexity. Santona Tuli shares learnings from integrating dbt Core into high-scale data ingestion workloads, including trade-offs between ease-of-use and scale.

Speaker: Santona Tuli, Head/Director of Data, Upsolver

Register for Coalesce at https://coalesce.getdbt.com

Using JSON schema to set the (dbt) stage for product analytics - Coalesce 2023

Surfline uses Segment to collect product analytics events to understand how surfers use their forecasts and live surf cameras across 9000+ surf spots worldwide. An open source tool was developed to define and manage product analytics event schemas using JSON schema which are used to build dbt staging models for all events.

With this solution, the data team has more time to build intermediate and mart models in dbt, knowing that our staging layer fully reflects Surfline’s product analytics events. This presentation is a real-life example on how schemas (or data contracts) can be used as a medium to build consensus, enforce standards, improve data quality, and speed up the dbt workflow for product analytics.

Speaker: Greg Clunies, Senior Analytics Engineer, Surfline

Register for Coalesce at https://coalesce.getdbt.com/