Generalized Data Distribution Iteration

A unified RL framework achieving superhuman performance with 500× less data

Jiajun Fan ¹, Changnan Xiao ¹

¹Tsinghua University / Huawei Noah's Ark Lab

📄 PMLR PDF 🏠 Homepage

Human World Records Broken

9,620%

Mean Human-Normalized Score

500×

More Sample-Efficient than Agent57

200M

Training Frames (vs 78B)

TL;DR — Sample efficiency and final performance are two classic RL challenges. GDI addresses both simultaneously by showing that training data distribution is the key lever — unifying diverse RL algorithms and achieving 9620% mean human-normalized score with only 200M frames (500× less than Agent57).

💡 Core Insight

GDI decouples RL challenges into two problems and casts both into training data distribution optimization:

Data richness → control the capacity and diversity of behavior policy
Exploration-exploitation → fine-grained adaptive control of sampling distribution

GDI integrates this into Generalized Policy Iteration (GPI), providing operator-based versions of well-known RL methods from DQN to Agent57 — all as special cases of GDI.

📅 Publication Journey

Late 2021

Research & Development

Developed GDI framework at Tsinghua / Huawei Noah's Ark Lab. Core insight: data distribution optimization unifies RL efficiency.

Jan 2022

Submitted to ICML 2022

Submitted to the 39th International Conference on Machine Learning.

May 2022

✅ Accepted at ICML 2022

Accepted as a full paper. Published in PMLR Vol. 162, pages 6103–6184.

Jul 2022

Presented at ICML 2022

Presented at ICML 2022 in Baltimore, Maryland. Available at proceedings.mlr.press.

📖 Cite This Paper

BibTeX (PMLR Official)

@InProceedings{pmlr-v162-fan22c,
  title = {Generalized Data Distribution Iteration},
  author = {Fan, Jiajun and Xiao, Changnan},
  booktitle = {Proceedings of the 39th International Conference on Machine Learning},
  pages = {6103--6184},
  year = {2022},
  volume = {162},
  series = {Proceedings of Machine Learning Research},
  month = {17--23 Jul},
  publisher = {PMLR},
  pdf = {https://proceedings.mlr.press/v162/fan22c/fan22c.pdf},
  url = {https://proceedings.mlr.press/v162/fan22c.html}
}