Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization
Published in International Conference on Learning Representations 2025 (ICLR 2025), 2025
Recommended citation: Jiajun Fan, Shuaike Shen, Chaoran Cheng, Yuxin Chen, Chumeng Liang, Ge Liu. "Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization." ICLR 2025. https://openreview.net/forum?id=2IoFFexvuw
We introduce ORW-CFM-W2, a self-evolving RLHF framework enabling flow matching models to continuously optimize through online reward feedback without relying on human-collected datasets. We derive a tractable Wasserstein-2 bound providing the first theoretical guarantee for collapse-free policy evolution.
