📋 ICML 2022

Generalized Data Distribution Iteration

A unified RL framework achieving superhuman performance with 500× less data

Jiajun Fan1, Changnan Xiao 1
1Tsinghua University / Huawei Noah's Ark Lab
22
Human World Records Broken
9,620%
Mean Human-Normalized Score
500×
More Sample-Efficient than Agent57
200M
Training Frames (vs 78B)
TL;DR — Sample efficiency and final performance are two classic RL challenges. GDI addresses both simultaneously by showing that training data distribution is the key lever — unifying diverse RL algorithms and achieving 9620% mean human-normalized score with only 200M frames (500× less than Agent57).

💡 Core Insight

GDI decouples RL challenges into two problems and casts both into training data distribution optimization:

GDI integrates this into Generalized Policy Iteration (GPI), providing operator-based versions of well-known RL methods from DQN to Agent57 — all as special cases of GDI.

The key formula: Generalized Bellman Operator with a data distribution operator D(·) that is jointly optimized with the value function — turning the data collection strategy itself into a learnable parameter.

Unified RL Algorithms as GDI Special Cases

DQN — uniform replay buffer (D = uniform)
PER — priority-weighted replay (D = prioritized)
R2D2 — recurrent replay with retrace (D = LSTM)
Agent57 — population-based explore (D = meta-controller)

📊 Key Results

22
Atari World Records
9,620%
Mean Human Score
500×
More Sample-Efficient
than Agent57
200M
Training Frames
(Agent57 uses 78B)

🔗 Follow-Up Work

GDI's data distribution insight directly inspired:

📅 Publication Journey

Late 2021
Research & Development
Developed GDI framework at Tsinghua / Huawei Noah's Ark Lab. Core insight: data distribution optimization unifies RL efficiency.
Jan 2022
Submitted to ICML 2022
Submitted to the 39th International Conference on Machine Learning.
May 2022
✅ Accepted at ICML 2022
Accepted as a full paper. Published in PMLR Vol. 162, pages 6103–6184.
Jul 2022
Presented at ICML 2022
Presented at ICML 2022 in Baltimore, Maryland. Available at proceedings.mlr.press.

📖 Cite This Paper

@InProceedings{pmlr-v162-fan22c,

  title     = {Generalized Data Distribution Iteration},

  author    = {Fan, Jiajun and Xiao, Changnan},

  booktitle = {Proceedings of the 39th International Conference on Machine Learning},

  pages     = {6103--6184},

  year      = {2022},

  editor    = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},

  volume    = {162},

  series    = {Proceedings of Machine Learning Research},

  month     = {17--23 Jul},

  publisher = {PMLR},

  pdf       = {https://proceedings.mlr.press/v162/fan22c/fan22c.pdf},

  url       = {https://proceedings.mlr.press/v162/fan22c.html}

}