Ranking Policy Gradient

Ranking Policy Gradient (RPG) is a sample-efficient off-policy policy gradient method that learns optimal ranking of actions to maximize the return. RPG has the following practical advantages:

It is a sample-efficient model-free algorithm for learning deterministic policies.
It is effortless to incorporate any exploration algorithm to improve the sample-efficiency of RPG further.

This codebase contains the implementation of RPG using the dopamine framework. The preprint of the RPG paper is available here.

Instructions

Install via source

Step 1.

Follow the install instruction of dopamine framework for Ubuntu or Max OS X.

Step 2.

Download the RPG source, i.e.

git clone git@github.com:illidanlab/rpg.git

Running the tests

cd ./rpg/dopamine 
python -um dopamine.atari.train \
  --agent_name=rpg \
  --base_dir=/tmp/dopamine \
  --random_seed 1 \
  --game_name=Pong \
  --gin_files='dopamine/agents/rpg/configs/rpg.gin'

Reproduce

To reproduce the results in the paper, please refer to the instruction in here.

Reference

If you use this RPG implementation in your work, please consider citing the following papers:

@article{lin2019ranking,
  title={Ranking Policy Gradient},
  author={Lin, Kaixiang and Zhou, Jiayu},
  journal={arXiv preprint arXiv:1906.09674},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dopamine		dopamine
.gitignore		.gitignore
README.md		README.md
code.md		code.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ranking Policy Gradient

Instructions

Install via source

Step 1.

Step 2.

Running the tests

Reproduce

Reference

About

Releases

Packages

Contributors 2

Languages

illidanlab/rpg

Folders and files

Latest commit

History

Repository files navigation

Ranking Policy Gradient

Instructions

Install via source

Step 1.

Step 2.

Running the tests

Reproduce

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages