๐ ICML 2022
Generalized Data Distribution Iteration
A unified RL framework achieving superhuman performance with 500ร less data
Jiajun Fan 1, Changnan Xiao 1
1Tsinghua University / Huawei Noah's Ark Lab
22
Human World Records Broken
9,620%
Mean Human-Normalized Score
500ร
More Sample-Efficient than Agent57
200M
Training Frames (vs 78B)
TL;DR โ Sample efficiency and final performance are two classic RL challenges. GDI addresses both simultaneously by showing that training data distribution is the key lever โ unifying diverse RL algorithms and achieving 9620% mean human-normalized score with only 200M frames (500ร less than Agent57).
๐ก Core Insight
GDI decouples RL challenges into two problems and casts both into training data distribution optimization:
- Data richness โ control the capacity and diversity of behavior policy
- Exploration-exploitation โ fine-grained adaptive control of sampling distribution
GDI integrates this into Generalized Policy Iteration (GPI), providing operator-based versions of well-known RL methods from DQN to Agent57 โ all as special cases of GDI.
๐
Publication Journey
Late 2021
Research & Development
Developed GDI framework at Tsinghua / Huawei Noah's Ark Lab. Core insight: data distribution optimization unifies RL efficiency.
Jan 2022
Submitted to ICML 2022
Submitted to the 39th International Conference on Machine Learning.
May 2022
โ
Accepted at ICML 2022
Accepted as a full paper. Published in PMLR Vol. 162, pages 6103โ6184.
Jul 2022
Presented at ICML 2022
๐ Cite This Paper
@InProceedings{pmlr-v162-fan22c,
title = {Generalized Data Distribution Iteration},
author = {Fan, Jiajun and Xiao, Changnan},
booktitle = {Proceedings of the 39th International Conference on Machine Learning},
pages = {6103--6184},
year = {2022},
volume = {162},
series = {Proceedings of Machine Learning Research},
month = {17--23 Jul},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v162/fan22c/fan22c.pdf},
url = {https://proceedings.mlr.press/v162/fan22c.html}
}