Portfolio item number 1
Short description of portfolio item number 1
Read more
Short description of portfolio item number 1
Read more
Short description of portfolio item number 2
Read more
Published in Arxiv, 2020
Recommended citation: Jiajun Fan, He Ba, Xian Guo, Jianye Hao, "Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning." Arxiv, 2020. https://arxiv.org/abs/2011.06752
Published in In the proceedings of AAAI-22 Workshop on Reinforcement Learning in Games, 2021
Recommended citation: Jiajun Fan, "A Review for Deep Reinforcement Learning in Atari: Benchmarks, Challenges, and Solutions." In the proceedings of AAAI-22 Workshop on Reinforcement Learning in Games, 2021. https://arxiv.org/abs/2112.04145.html
Published in Arxiv, 2021
Recommended citation: Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng, "An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning." Arxiv, 2021. https://arxiv.org/abs/2106.00707
Published in Arxiv, 2021
Recommended citation: Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng, "CASA: A Bridge Between Gradient of Policy Improvement and Policy Evaluation." Arxiv, 2021. https://arxiv.org/abs/2105.03923
Published in In the proceedings of AAAI-22 Workshop on Reinforcement Learning in Games, 2021
Recommended citation: Jiajun Fan, Changnan Xiao, Yue Huang, "GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning." In the proceedings of AAAI-22 Workshop on Reinforcement Learning in Games, 2021. https://arxiv.org/abs/2106.06232
Published in In the proceedings of Deep Reinforcement Learning Workshop NeurIPS 2022, 2022
Recommended citation: Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng, Haiyan Yin, "CASA: Bridging the Gap between Policy Improvement and Policy Evaluation with Conflict Averse Policy Iteration." In the proceedings of Deep Reinforcement Learning Workshop NeurIPS 2022, 2022. https://arxiv.org/abs/2105.03923.html
Published in Arxiv, 2022
Recommended citation: Hao Wang, Zhichao Chen, Jiajun Fan, Yuxin Huang, Weiming Liu, Xinggao Liu, "Entire Space Counterfactual Learning: Tuning, Analytical Properties and Industrial Applications." Arxiv, 2022. https://doi.org/10.48550/arXiv.2210.11039
Published in In the proceedings of International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, 2022
Recommended citation: Jiajun Fan, Changnan Xiao, "Generalized Data Distribution Iteration." In the proceedings of International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, 2022. https://proceedings.mlr.press/v162/fan22c.html
Published in International Conference on Learning Representations 2023 (ICLR 2023, Oral — Notable Top 5%), 2023
We propose LBC (Learnable Behavior Control), a unified framework enabling significantly enlarged behavior selection space via a hybrid behavior mapping. Our agents achieved 10077.52% mean human normalized score and surpassed 24 human world records within 1B training frames, demonstrating SOTA performance with exceptional sample efficiency. Read more
Recommended citation: Jiajun Fan, Yuzheng Zhuang, Yuecheng Liu, Jianye Hao, Bin Wang, Jiangcheng Zhu, Hao Wang, Shu-Tao Xia. "Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection." ICLR 2023, oral (ranked 5/4176). https://openreview.net/forum?id=FeWvD0L_a4
Published in Conference on Neural Information Processing Systems 2023 (NeurIPS 2023), 2023
We propose an optimal transport-based framework for treatment effect estimation that addresses selection bias via balanced representation learning, achieving substantially better performance than state-of-the-art methods. Read more
Recommended citation: Hao Wang, Jiajun Fan, Zhichao Chen, Haoran Li, Weiming Liu, Tianqiao Liu, Quanyu Dai, Yichao Wang, Zhenhua Dong, Ruiming Tang. "Optimal Transport for Treatment Effect Estimation." NeurIPS 2023. https://arxiv.org/abs/2310.18286
Published in NeurIPS 2024 Workshop on AI for New Drug Modalities (AI4Mat), 2024
Combined the strengths of model-free RL and design-based RL, improving multiple metrics in MuJoCo control tasks. Introduced behavior control into design-based RL, improving sample efficiency. Read more
Recommended citation: Jiajun Fan, et al. "Efficient Design-and-Control Automation with Reinforcement Learning and Adaptive Exploration." NeurIPS 2024 Workshop AI4Mat.
Published in International Conference on Learning Representations 2025 (ICLR 2025), 2025
We introduce ORW-CFM-W2, a self-evolving RLHF framework enabling flow matching models to continuously optimize through online reward feedback without relying on human-collected datasets. We derive a tractable Wasserstein-2 bound providing the first theoretical guarantee for collapse-free policy evolution. Read more
Recommended citation: Jiajun Fan, Shuaike Shen, Chaoran Cheng, Yuxin Chen, Chumeng Liang, Ge Liu. "Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization." ICLR 2025. https://openreview.net/forum?id=2IoFFexvuw
Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025, 2025
PRANCE is a Vision Transformer compression framework that jointly optimizes token usage and structural channel pruning, achieving state-of-the-art efficiency-accuracy trade-offs for adaptive ViT inference. Read more
Recommended citation: Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi Wang, Wenwu Zhu. "PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference." TPAMI 2025. https://arxiv.org/abs/2407.05010
Published in The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025), 2025
We propose ADRPO, which dynamically adjusts divergence regularization strength based on advantage estimates — reducing regularization for high-value samples while applying stronger constraints to poor samples. Enables a 2B SD3 model to surpass 4.8B/12B models; also generalizes to LLMs and multimodal reasoning. Read more
Recommended citation: Jiajun Fan, Tong Wei, Chaoran Cheng, Yuxin Chen, Ge Liu. "Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models." NeurIPS 2025. https://openreview.net/forum?id=aXO0xg0ttW
Published in The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025), 2025
We propose VarCon, reformulating supervised contrastive learning as variational inference over latent class variables. Achieves SOTA 79.36% Top-1 on ImageNet-1K with ResNet-50, with superior embedding space organization and few-shot robustness. Read more
Recommended citation: Ziwen Wang, Jiajun Fan, Thao Nguyen, Heng Ji, Ge Liu. "Variational Supervised Contrastive Learning." NeurIPS 2025. https://openreview.net/forum?id=uOOlHOq500
Published in arXiv preprint (Under Review), 2025, 2025
We propose ProteinZero, a self-improving framework for protein generation via online reinforcement learning. The model iteratively improves its own protein design capabilities without human-curated data, demonstrating how RL-based self-evolution can be extended to biological sequence generation. Read more
Recommended citation: Ziwen Wang, Jiajun Fan, Rui Guo, Thao Nguyen, Heng Ji, Ge Liu. "ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning." arXiv:2506.07459, 2025. https://arxiv.org/abs/2506.07459
Published in arXiv preprint (Under Review), 2025, 2025
We present AC-Flow, a robust actor-critic framework for fine-tuning flow matching generative models with intermediate feedback. Key innovations include reward shaping for stable intermediate value learning, a dual-stability mechanism combining advantage clipping with critic warm-up, and a scalable generalized critic weighting scheme with Wasserstein regularization. Achieves SOTA text-to-image alignment on Stable Diffusion 3 without compromising diversity or stability. Read more
Recommended citation: Jiajun Fan, Chaoran Cheng, Shuaike Shen, Xiangxin Zhou, Ge Liu. "Fine-tuning Flow Matching Generative Models with Intermediate Feedback." arXiv:2510.18072, 2025. https://arxiv.org/abs/2510.18072
Published in International Conference on Learning Representations 2026 (ICLR 2026), 2026
We propose CESAR, an online RL framework (GRPO) with multi-faceted reasoning process rewards incentivizing consistency, structured analytical patterns, and calibrated depth. Resolves test-time inverse scaling in Audio LLMs; achieves SOTA on MMAU Test-mini, substantially outperforming Gemini 2.5 Pro and GPT-4o Audio. Read more
Recommended citation: Jiajun Fan, Roger Ren, Jingyuan Li, Rahul Pandey, Prashanth G. Shivakumar, Yile Gu, Ankur Gandhe, Ge Liu, Ivan Bulyko. "Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards." ICLR 2026. https://openreview.net/forum?id=DUr48hxO2h
Published in International Conference on Learning Representations 2026 (ICLR 2026), 2026
SP-VLA unifies model scheduling and token pruning for VLA acceleration, achieving 1.5× lossless speedup in LIBERO and 2.4× in SimplerEnv, with up to 6% average performance gain. Read more
Recommended citation: Ye Li, Yuan Meng, Zewen Sun, Kangye Ji, Chen Tang, Jiajun Fan, Xinzhu Ma, Shu-Tao Xia, Zhi Wang, Wenwu Zhu. "SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration." ICLR 2026. https://openreview.net/forum?id=RwdGIIjPlC
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown! Read more
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field. Read more
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post. Read more
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post. Read more