Topic

Data Vault

data_modeling data_warehouse analytics analytics_engineering

Activities

1

tagged

Activity Trend

4 peak/qtr

2020-Q1 2026-Q2

Top Events

Data Engineering Podcast 7 O'Reilly Data Engineering Books 4 Data + AI Summit 2025 2 The Joe Reis Show 2 dbt Coalesce 2023 2 PyConDE & PyData Berlin 2023 1 DATA MINER Big Data Europe Conference 2020 1 O'Reilly Data Science Books 1 Die Data Engineering Reise 1 The Analytics Engineering Podcast 1 Snowflake World Tour Amsterdam 1 Big Data LDN 2024 1

Top Speakers

Tobias Macey 7 Serge Gershkovich (SQL DBM) 3 Joe Reis (DeepLearning.AI) 2 Kent Graziano (SnowflakeDB) 2 Michael Olschimke (Scalefree) 2 Brandon Taylor (Guild) 2 Jos van Dongen 1 George Park 1 Daniel Linstedt 1 Ahmed Elsamadisi (Narrator) 1 Olivia Ren (Databricks) 1 Antonia Scherz 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: PyConDE & PyData Berlin 2023 ×

Unlocking Information - Creating Synthetic Data for Open Access.

2023-04-19 · PyConDE & PyData Berlin 2023

talk

by Antonia Scherz

Many good project ideas fail before they even start due to the sensitive personal data required. The good news: a synthetic version of this data does not need protection. Synthetic data copies the actual data's structure and statistical properties without recreating personally identifiable information. The bad news: It is difficult to create synthetic data for open-access use, without recreating the exact copy of actual data. This talk will give hands-on insights into synthetic data creation and challenges along its lifecycle. We will learn how to create and evaluate synthetic data for any use case using the open-source package Synthetic Data Vault. We will find answers to why it takes so long to synthesize the huge amount of data dormant in public administration. The talk addresses owners who want to create access to their private data as well as analysts looking to use synthetic data. After this session, listeners will know which steps to take to generate synthetic data for multi-purpose use and its limitations for real-world analyses.