๐ Preprint 2025 ยท Under Review
Fine-tuning Flow Matching Generative Models with Intermediate Feedback
AC-Flow: Robust actor-critic framework for flow matching โ stable intermediate value learning without collapse
Jiajun Fan 1,
Chaoran Cheng 1,
Shuaike Shen 1,
Xiangxin Zhou 1,
Ge Liu 1
1University of Illinois Urbana-Champaign
TL;DR โ Existing RLHF methods for flow matching only use outcome rewards (ORW-CFM-W2), suffering from credit assignment problems. AC-Flow introduces a full actor-critic framework with intermediate feedback โ reward shaping + dual-stability + generalized critic weighting โ achieving SOTA text-to-image alignment on SD3 without degrading diversity or stability.
๐ง Three Key Innovations
1
Reward Shaping
Provides well-normalized learning signals for stable intermediate value learning and gradient control โ enabling the critic to reason about multi-step trajectories.
2
Dual-Stability Mechanism
Combines advantage clipping (prevents destructive policy updates) with a critic warm-up phase (lets critic mature before guiding the actor).
3
Generalized Critic Weighting
Extends reward-weighted methods while preserving model diversity via Wasserstein regularization โ compatible with ORW-CFM-W2 as a special case.
๐ AC-Flow vs ORW-CFM-W2
ORW-CFM-W2 (ICLR 2025)
- Outcome reward only
- No intermediate value learning
- Credit assignment challenge
- First online RLHF for flow matching
AC-Flow (This Work)
- Intermediate feedback + actor-critic
- Stable value learning via reward shaping
- Dual-stability prevents collapse
- SOTA on SD3 with even less data
AC-Flow generalizes ORW-CFM-W2 โ the critic weighting scheme subsumes reward-weighted methods as a special case.
๐ Cite This Paper
@article{fan2025acflow,
title = {Fine-tuning Flow Matching Generative Models with Intermediate Feedback},
author = {Jiajun Fan and Chaoran Cheng and Shuaike Shen
and Xiangxin Zhou and Ge Liu},
journal = {arXiv preprint arXiv:2510.18072},
year = {2025},
url = {https://arxiv.org/abs/2510.18072}
}