talk-data.com talk-data.com

R

Speaker

Roya Norouzi Kandalan

1

talks

Google Developer Expert for AI/ML

Filter by Event / Source

Talks & appearances

1 activities · Newest first

Search activities →

Direct Preference Optimization (DPO) is a cutting-edge approach for fine-tuning large language models to align with human preferences. DPO eliminates complexities of Reinforcement Learning (RL), like reward modeling, by directly training models to prioritize user-selected over rejected outputs, simplifying alignment while maintaining performance. This talk covers DPO's principles, implementation, and significance in refining AI for practical applications. It highlights DPO's streamlined efficiency in optimizing model outputs, unpacking steps for preference-based adjustments, and contrasting DPO with traditional RL, showing its advantages in reducing overhead and enhancing scalability.