Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models

Published in The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025), 2025

Recommended citation: Jiajun Fan, Tong Wei, Chaoran Cheng, Yuxin Chen, Ge Liu. "Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models." NeurIPS 2025. https://openreview.net/forum?id=aXO0xg0ttW

We propose ADRPO, which dynamically adjusts divergence regularization strength based on advantage estimates — reducing regularization for high-value samples while applying stronger constraints to poor samples. Enables a 2B SD3 model to surpass 4.8B/12B models; also generalizes to LLMs and multimodal reasoning.