This package includes the code used to run the experiments in the paper "Scalable methods for computing state similarity in deterministic Markov Decision Processes".
http://arxiv.org/abs/1911.09291
All of the commands below are run from the parent google_research
directory.
It is recommended to start a virtualenv before running these commands:
virtualenv venv
source venv/bin/activate
Then install all necessary packages:
sudo apt install python-tk
pip install -r bisimulation_aaai2020/requirements.txt
The code provided is a more general than what is necessary in the paper (it supports arbitrary grid shapes and programatically constructed square grids). We limit ourselves to describing what is necessary for reproducing the paper results.
We used the grid file specified in grid_world/configs/mirrored_rooms.grid
and the gin-config file grid_world/configs/mirrored_rooms.gin
.
To run:
python -m bisimulation_aaai2020.grid_world.compute_metric \
--base_dir=/tmp/grid_world \
--grid_file=bisimulation_aaai2020/grid_world/configs/mirrored_rooms.grid \
--gin_files=bisimulation_aaai2020/grid_world/configs/mirrored_rooms.gin \
--nosample_distance_pairs
The code provided allows you to load a trained Dopamine Rainbow agent and train an on-policy bisimulation metric approximant on the state representations.
First we need to obtain the trained agent checkpoints. The Dopamine authors have provided us with the trained checkpoints for the Rainbow agent on the 3 games evaluated in Section 4.3 of their paper. They can be downloaded at the following URLs:
- SpaceInvaders
- Pong
- Asterix
Save these checkpoints in /tmp/dopamine/trained_agents/${GAME}/
and
rename them to remove the colab_samples_rainbow_SpaceInvaders_v4_checkpoints_
prefix.
GAME=SpaceInvaders
python -m bisimulation_aaai2020.dopamine.play \
--base_dir=/tmp/dopamine/trained_metrics/${GAME} \
--trained_checkpoint=/tmp/dopamine/trained_agents/${GAME}/tf_ckpt-199 \
--gin_files=bisimulation_aaai2020/dopamine/configs/rainbow.gin \
--gin_bindings="atari_lib.create_atari_environment.game_name='${GAME}'"
This will load the trained bisimulation metric approximant, start an evaluation run, and report the bisimulation distances from a specified start frame to every other frame in the episode, generating a set of .png and .pdf files along the way. It will also try to generate a video compiling all the .png files.
You can download the trained metrics from the paper at the following URLs:
- SpaceInvaders
- Pong
- Asterix
Save these checkpoints in /tmp/dopamine/trained_metrics/${GAME}/checkpoints/
and
run the following command:
GAME=SpaceInvaders
python -m bisimulation_aaai2020.dopamine.evaluate \
--base_dir=/tmp/dopamine/evals/${GAME} \
--gin_bindings="atari_lib.create_atari_environment.game_name='${GAME}'" \
--metric_checkpoint=/tmp/dopamine/trained_metrics/${GAME}/checkpoints/tf_ckpt-4
You can see a high-level explanatory video here, which includes the bisimulation metric approximants for a trained Rainbow agent playing SpaceInvaders, Pong, and Asterix.
If you would like to cite the paper/code, please use the following BibTeX entry:
@inproceedings{castro20bisimulation,
author = {Pablo Samuel Castro},
title = {Scalable methods for computing state similarity in deterministic {M}arkov {D}ecision {P}rocesses},
year = {2020},
booktitle = {Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)},
}