The Exploration-Exploitation Dilemma in RLHF for Generative Models
Published:
A deep dive into why fixed regularization in RLHF leads to diversity collapse, and how adaptive sample-level control resolves the exploration-exploitation dilemma. Read more
