Roya Norouzi Kandalan

Activities

1

talks

Google Developer Expert for AI/ML

Filter by Event / Source

Google Cloud Next '25 1

Talks & appearances

1 activities · Newest first

Search activities →

Direct Preference Optimization: What, Why, How

2025-04-09 · Google Cloud Next '25

session

AI/ML

Direct Preference Optimization (DPO) is a cutting-edge approach for fine-tuning large language models to align with human preferences. DPO eliminates complexities of Reinforcement Learning (RL), like reward modeling, by directly training models to prioritize user-selected over rejected outputs, simplifying alignment while maintaining performance. This talk covers DPO's principles, implementation, and significance in refining AI for practical applications. It highlights DPO's streamlined efficiency in optimizing model outputs, unpacking steps for preference-based adjustments, and contrasting DPO with traditional RL, showing its advantages in reducing overhead and enhancing scalability.