talk-data.com talk-data.com

Y

Speaker

Yev Meyer

1

talks

Principal Research Scientist NVIDIA

Yev Meyer is Principal Research Scientist at NVIDIA. Prior to that, Yev was Chief Scientist at Gretel, the synthetic data platform for developers and led AI/ML teams at a number of Enterprise SaaS startups, including Guru, Curalate and RJMetrics (precursor to dbt Labs, Stitch). He holds a PhD in Computational Neuroscience from Columbia University, where he developed models of dendritic processing and multisensory encoding in spiking neural circuits. He also performed brain surgery on fruit flies to validate these models on (non-artificial) neural networks of the olfactory system.

Bio from: Data + AI Summit 2025

Filter by Event / Source

Talks & appearances

1 activities · Newest first

Search activities →
Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions

A big challenge in LLM development and synthetic data generation is ensuring data quality and diversity. While data incorporating varied perspectives and reasoning traces consistently improves model performance, procuring such data remains impossible for most enterprises. Human-annotated data struggles to scale, while purely LLM-based generation often suffers from distribution clipping and low entropy. In a novel compound AI approach, we combine LLMs with probabilistic graphical models and other tools to generate synthetic personas grounded in real demographic statistics. The approach allows us to address major limitations in bias, licensing, and persona skew of existing methods. We release the first open-source dataset aligned with real-world distributions and show how enterprises can leverage it with Gretel Data Designer (now part of NVIDIA) to bring diversity and quality to model training on the Databricks platform, all while addressing model collapse and data provenance concerns head-on.