Ranking Policy Gradient (RPG) is a sample-efficient off-policy policy gradient method that learns optimal ranking of actions to maximize the return. RPG has the following practical advantages:
- It is a sample-efficient model-free algorithm for learning deterministic policies.
- It is effortless to incorporate any exploration algorithm to improve the sample-efficiency of RPG further.
This codebase contains the implementation of RPG using the dopamine framework. The preprint of the RPG paper is available here.
Follow the install instruction of dopamine framework for Ubuntu or Max OS X.
Download the RPG source, i.e.
git clone git@github.com:illidanlab/rpg.git
cd ./rpg/dopamine
python -um dopamine.atari.train \
--agent_name=rpg \
--base_dir=/tmp/dopamine \
--random_seed 1 \
--game_name=Pong \
--gin_files='dopamine/agents/rpg/configs/rpg.gin'
To reproduce the results in the paper, please refer to the instruction in here.
If you use this RPG implementation in your work, please consider citing the following papers:
@article{lin2019ranking,
title={Ranking Policy Gradient},
author={Lin, Kaixiang and Zhou, Jiayu},
journal={arXiv preprint arXiv:1906.09674},
year={2019}
}