Ever wondered how systems like Udio and Stable Audio turn a text-prompt into a full-fledged song? In this talk, we’ll pull back the curtain on the technology behind text-to-music generation, focusing on latent diffusion models. We’ll compare popular model architectures, break down key concepts with intuitive visuals, and explore the “why” behind their design choices. No deep learning background needed - just curiosity! We’ll end with a short interactive quiz to recap and test your understanding.
talk-data.com
A
Speaker
Arjun Bahuguna
1
talks
Co-founder
Audio Realities
Arjun is the co-founder of Audio Realities, an audio tech startup based in Aachen, where he leads the development of multilingual voice conversion models. He previously worked at Pitch Innovations, a plugin company based out of Chennai, where he worked on machine learning for music and plugins. He’s been exploring deep learning since 2016, and his work has been featured at NeurIPS, ADC Bristol, ADCx India, and will be presented at DCASE 2025 in Barcelona.
Bio from: July Meetup: AI Song Generation and surfing attention
Filtering by:
July Meetup: AI Song Generation and surfing attention
×
Filter by Event / Source
Talks & appearances
Showing 1 of 1 activities