Posts by Collection

awards

portfolio

Portfolio item number 1

Short description of portfolio item number 1
Read more

Portfolio item number 2

Short description of portfolio item number 2
Read more

publications

Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning

Published in Arxiv, 2020

Read more

Recommended citation: Jiajun Fan, He Ba, Xian Guo, Jianye Hao, "Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning." Arxiv, 2020. https://arxiv.org/abs/2011.06752

A Review for Deep Reinforcement Learning in Atari: Benchmarks, Challenges, and Solutions

Published in In the proceedings of AAAI-22 Workshop on Reinforcement Learning in Games, 2021

Read more

Recommended citation: Jiajun Fan, "A Review for Deep Reinforcement Learning in Atari: Benchmarks, Challenges, and Solutions." In the proceedings of AAAI-22 Workshop on Reinforcement Learning in Games, 2021. https://arxiv.org/abs/2112.04145.html

An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

Published in Arxiv, 2021

Read more

Recommended citation: Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng, "An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning." Arxiv, 2021. https://arxiv.org/abs/2106.00707

CASA: A Bridge Between Gradient of Policy Improvement and Policy Evaluation

Published in Arxiv, 2021

Read more

Recommended citation: Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng, "CASA: A Bridge Between Gradient of Policy Improvement and Policy Evaluation." Arxiv, 2021. https://arxiv.org/abs/2105.03923

GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

Published in In the proceedings of AAAI-22 Workshop on Reinforcement Learning in Games, 2021

Read more

Recommended citation: Jiajun Fan, Changnan Xiao, Yue Huang, "GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning." In the proceedings of AAAI-22 Workshop on Reinforcement Learning in Games, 2021. https://arxiv.org/abs/2106.06232

CASA: Bridging the Gap between Policy Improvement and Policy Evaluation with Conflict Averse Policy Iteration

Published in In the proceedings of Deep Reinforcement Learning Workshop NeurIPS 2022, 2022

Read more

Recommended citation: Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng, Haiyan Yin, "CASA: Bridging the Gap between Policy Improvement and Policy Evaluation with Conflict Averse Policy Iteration." In the proceedings of Deep Reinforcement Learning Workshop NeurIPS 2022, 2022. https://arxiv.org/abs/2105.03923.html

Entire Space Counterfactual Learning: Tuning, Analytical Properties and Industrial Applications

Published in Arxiv, 2022

Read more

Recommended citation: Hao Wang, Zhichao Chen, Jiajun Fan, Yuxin Huang, Weiming Liu, Xinggao Liu, "Entire Space Counterfactual Learning: Tuning, Analytical Properties and Industrial Applications." Arxiv, 2022. https://doi.org/10.48550/arXiv.2210.11039

Generalized Data Distribution Iteration

Published in In the proceedings of International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, 2022

Read more

Recommended citation: Jiajun Fan, Changnan Xiao, "Generalized Data Distribution Iteration." In the proceedings of International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, 2022. https://proceedings.mlr.press/v162/fan22c.html

Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

Published in International Conference on Learning Representations 2023 (ICLR 2023, Oral — Notable Top 5%), 2023

We propose LBC (Learnable Behavior Control), a unified framework enabling significantly enlarged behavior selection space via a hybrid behavior mapping. Our agents achieved 10077.52% mean human normalized score and surpassed 24 human world records within 1B training frames, demonstrating SOTA performance with exceptional sample efficiency. Read more

Recommended citation: Jiajun Fan, Yuzheng Zhuang, Yuecheng Liu, Jianye Hao, Bin Wang, Jiangcheng Zhu, Hao Wang, Shu-Tao Xia. "Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection." ICLR 2023, oral (ranked 5/4176). https://openreview.net/forum?id=FeWvD0L_a4

Optimal Transport for Treatment Effect Estimation

Published in Conference on Neural Information Processing Systems 2023 (NeurIPS 2023), 2023

We propose an optimal transport-based framework for treatment effect estimation that addresses selection bias via balanced representation learning, achieving substantially better performance than state-of-the-art methods. Read more

Recommended citation: Hao Wang, Jiajun Fan, Zhichao Chen, Haoran Li, Weiming Liu, Tianqiao Liu, Quanyu Dai, Yichao Wang, Zhenhua Dong, Ruiming Tang. "Optimal Transport for Treatment Effect Estimation." NeurIPS 2023. https://arxiv.org/abs/2310.18286

Efficient Design-and-Control Automation with Reinforcement Learning and Adaptive Exploration

Published in NeurIPS 2024 Workshop on AI for New Drug Modalities (AI4Mat), 2024

Combined the strengths of model-free RL and design-based RL, improving multiple metrics in MuJoCo control tasks. Introduced behavior control into design-based RL, improving sample efficiency. Read more

Recommended citation: Jiajun Fan, et al. "Efficient Design-and-Control Automation with Reinforcement Learning and Adaptive Exploration." NeurIPS 2024 Workshop AI4Mat.

Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization

Published in International Conference on Learning Representations 2025 (ICLR 2025), 2025

We introduce ORW-CFM-W2, a self-evolving RLHF framework enabling flow matching models to continuously optimize through online reward feedback without relying on human-collected datasets. We derive a tractable Wasserstein-2 bound providing the first theoretical guarantee for collapse-free policy evolution. Read more

Recommended citation: Jiajun Fan, Shuaike Shen, Chaoran Cheng, Yuxin Chen, Chumeng Liang, Ge Liu. "Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization." ICLR 2025. https://openreview.net/forum?id=2IoFFexvuw

PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference

Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025, 2025

PRANCE is a Vision Transformer compression framework that jointly optimizes token usage and structural channel pruning, achieving state-of-the-art efficiency-accuracy trade-offs for adaptive ViT inference. Read more

Recommended citation: Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi Wang, Wenwu Zhu. "PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference." TPAMI 2025. https://arxiv.org/abs/2407.05010

Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models

Published in The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025), 2025

We propose ADRPO, which dynamically adjusts divergence regularization strength based on advantage estimates — reducing regularization for high-value samples while applying stronger constraints to poor samples. Enables a 2B SD3 model to surpass 4.8B/12B models; also generalizes to LLMs and multimodal reasoning. Read more

Recommended citation: Jiajun Fan, Tong Wei, Chaoran Cheng, Yuxin Chen, Ge Liu. "Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models." NeurIPS 2025. https://openreview.net/forum?id=aXO0xg0ttW

Variational Supervised Contrastive Learning

Published in The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025), 2025

We propose VarCon, reformulating supervised contrastive learning as variational inference over latent class variables. Achieves SOTA 79.36% Top-1 on ImageNet-1K with ResNet-50, with superior embedding space organization and few-shot robustness. Read more

Recommended citation: Ziwen Wang, Jiajun Fan, Thao Nguyen, Heng Ji, Ge Liu. "Variational Supervised Contrastive Learning." NeurIPS 2025. https://openreview.net/forum?id=uOOlHOq500

ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning

Published in arXiv preprint (Under Review), 2025, 2025

We propose ProteinZero, a self-improving framework for protein generation via online reinforcement learning. The model iteratively improves its own protein design capabilities without human-curated data, demonstrating how RL-based self-evolution can be extended to biological sequence generation. Read more

Recommended citation: Ziwen Wang, Jiajun Fan, Rui Guo, Thao Nguyen, Heng Ji, Ge Liu. "ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning." arXiv:2506.07459, 2025. https://arxiv.org/abs/2506.07459

Fine-tuning Flow Matching Generative Models with Intermediate Feedback

Published in arXiv preprint (Under Review), 2025, 2025

We present AC-Flow, a robust actor-critic framework for fine-tuning flow matching generative models with intermediate feedback. Key innovations include reward shaping for stable intermediate value learning, a dual-stability mechanism combining advantage clipping with critic warm-up, and a scalable generalized critic weighting scheme with Wasserstein regularization. Achieves SOTA text-to-image alignment on Stable Diffusion 3 without compromising diversity or stability. Read more

Recommended citation: Jiajun Fan, Chaoran Cheng, Shuaike Shen, Xiangxin Zhou, Ge Liu. "Fine-tuning Flow Matching Generative Models with Intermediate Feedback." arXiv:2510.18072, 2025. https://arxiv.org/abs/2510.18072

Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards

Published in International Conference on Learning Representations 2026 (ICLR 2026), 2026

We propose CESAR, an online RL framework (GRPO) with multi-faceted reasoning process rewards incentivizing consistency, structured analytical patterns, and calibrated depth. Resolves test-time inverse scaling in Audio LLMs; achieves SOTA on MMAU Test-mini, substantially outperforming Gemini 2.5 Pro and GPT-4o Audio. Read more

Recommended citation: Jiajun Fan, Roger Ren, Jingyuan Li, Rahul Pandey, Prashanth G. Shivakumar, Yile Gu, Ankur Gandhe, Ge Liu, Ivan Bulyko. "Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards." ICLR 2026. https://openreview.net/forum?id=DUr48hxO2h

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

Published in International Conference on Learning Representations 2026 (ICLR 2026), 2026

SP-VLA unifies model scheduling and token pruning for VLA acceleration, achieving 1.5× lossless speedup in LIBERO and 2.4× in SimplerEnv, with up to 6% average performance gain. Read more

Recommended citation: Ye Li, Yuan Meng, Zewen Sun, Kangye Ji, Chen Tang, Jiajun Fan, Xinzhu Ma, Shu-Tao Xia, Zhi Wang, Wenwu Zhu. "SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration." ICLR 2026. https://openreview.net/forum?id=RwdGIIjPlC

talks

Talk 1 on Relevant Topic in Your Field

Published: March 01, 2012

This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown! Read more

Tutorial 1 on Relevant Topic in Your Field

Published: March 01, 2013

More information here Read more

Talk 2 on Relevant Topic in Your Field

Published: February 01, 2014

More information here Read more

Conference Proceeding talk 3 on Relevant Topic in Your Field

Published: March 01, 2014

This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field. Read more

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post. Read more

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post. Read more