talk-data.com
Master Schema Translations in the Era of Open Data Lake
Description
Unity Catalog puts variety of schemas into a centralized repository, now the developer community wants more productivity and automation for schema inference, translation, evolution and optimization especially for the scenarios of ingestion and reverse-ETL with more code generations.Coinbase Data Platform attempts to pave a path with "Schemaster" to interact with data catalog with the (proposed) metadata model to make schema translation and evolution more manageable across some of the popular systems, such as Delta, Iceberg, Snowflake, Kafka, MongoDB, DynamoDB, Postgres...This Lighting Talk covers 4 areas: The complexity and caveats of schema differences among The proposed field-level metadata model, and 2 translation patterns: point-to-point vs hub-and-spoke Why Data Profiling be augmented to enhance schema understanding and translation Integrate it with Ingestion & Reverse-ETL in a Databricks-oriented eco system Takeaway: standardize schema lineage & translation