A unified RL framework achieving superhuman performance with 500× less data
GDI decouples RL challenges into two problems and casts both into training data distribution optimization:
GDI integrates this into Generalized Policy Iteration (GPI), providing operator-based versions of well-known RL methods from DQN to Agent57 — all as special cases of GDI.
@InProceedings{pmlr-v162-fan22c,
title = {Generalized Data Distribution Iteration},
author = {Fan, Jiajun and Xiao, Changnan},
booktitle = {Proceedings of the 39th International Conference on Machine Learning},
pages = {6103--6184},
year = {2022},
volume = {162},
series = {Proceedings of Machine Learning Research},
month = {17--23 Jul},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v162/fan22c/fan22c.pdf},
url = {https://proceedings.mlr.press/v162/fan22c.html}
}