Yet Another Reinforcement Learning Package
Implementations of `CEM
</yarlp/agent/cem_agent.py>`__,
`REINFORCE
</yarlp/agent/pg_agents.py>`__,
`TRPO
</yarlp/agent/trpo_agent.py>`__,
`DDQN
</yarlp/agent/ddqn_agent.py>`__,
`A2C
</yarlp/agent/a2c_agent.py>`__ with reproducible benchmarks.
Experiments are templated using jsonschema
and are compared to
published results. This is meant to be a starting point for working
implementations of classic RL algorithms. Unfortunately even
implementations from OpenAI baselines are not always
reproducible.
A working Dockerfile with yarlp
installed can be run with:
docker build -t "yarlpd" .
docker run -it yarlpd bash
To run a benchmark, simply:
python yarlp/experiment/experiment.py --help
If you want to run things manually, look in examples
or look at
this:
from yarlp.agent.trpo_agent import TRPOAgent
from yarlp.utils.env_utils import NormalizedGymEnv
env = NormalizedGymEnv('MountainCarContinuous-v0')
agent = TRPOAgent(env, seed=123)
agent.train(max_timesteps=1000000)
We benchmark against published results and Openai
`baselines
<https://github.com/openai/baselines>`__ where available
using
`yarlp/experiment/experiment.py
</yarlp/experiment/experiment.py>`__.
Benchmark scripts for Openai baselines
were made ad-hoc, such as
this
one.
python yarlp/experiment/experiment.py run_atari10m_ddqn_benchmark
I trained 6 Atari environments for 10M time-steps (40M frames), using 1 random seed, since I only have 1 GPU and limited time on this Earth. I used DDQN with dueling networks, but no prioritized replay (although it's implemented). I compare the final mean 100 episode raw scores for yarlp (with exploration of 0.01) with results from Hasselt et al, 2015 and Wang et al, 2016 which train for 200M frames and evaluate on 100 episodes (exploration of 0.05).
I don't compare to OpenAI baselines because the OpenAI DDQN implementation is not currently able to reproduce published results as of 2018-01-20. See this github issue, although I found these benchmark plots to be pretty helpful.
env | yarl p DUEL 40M Fram es | Hass elt et al DDQN 200M Fram es | Wang et al DUEL 200M Fram es |
---|---|---|---|
Beam Ride r | 8705 | 7654 | 1216 4 |
Brea kout | 423. 5 | 375 | 345 |
Pong | 20.7 3 | 21 | 21 |
QBer t | 5410 .75 | 1487 5 | 1922 0.3 |
Seaq uest | 5300 .5 | 7995 | 5024 5.2 |
Spac eInv ader s | 1978 .2 | 3154 .6 | 6427 .3 |
|Bea mRid erNo Fram eski p-v4 | | |Bre akou tNoF rame skip -v4| | |Pon gNoF rame skip -v4| | |Qbe rtNo Fram eski p-v4 | |
|Sea ques tNoF rame skip -v4| | |Spa ceIn vade rsNo Fram eski p-v4 | |
python yarlp/experiment/experiment.py run_atari10m_a2c_benchmark
A2C on 10M time-steps (40M frames) with 1 random seed. Results compared to learning curves from Mnih et al, 2016 extracted at 10M time-steps from Figure 3. You are invited to run for multiple seeds and the full 200M frames for a better comparison.
env | yarlp A2C 40M | Mnih et al A3C 40M 16-threads |
---|---|---|
BeamRider | 3150 | ~3000 |
Breakout | 418 | ~150 |
Pong | 20 | ~20 |
QBert | 3644 | ~1000 |
SpaceInvaders | 805 | ~600 |
|Bea mRid erNo Fram eski p-v4 | | |Bre akou tNoF rame skip -v4| | |Pon gNoF rame skip -v4| | |Qbe rtNo Fram eski p-v4 | |
|Sea ques tNoF rame skip -v4| | |Spa ceIn vade rsNo Fram eski p-v4 | |
Here are some more plots from OpenAI to compare against.
python yarlp/experiment/experiment.py run_mujoco1m_benchmark
We average over 5 random seeds instead of 3 for both baselines
and
yarlp
. More seeds probably wouldn't hurt here, we report 95th
percent confidence intervals.
CLI convenience scripts will be installed with the package:
- Run a benchmark:
python yarlp/experiment/experiment.py --help
- Plot
yarlp
compared to Openaibaselines
benchmarks:compare_benchmark <yarlp-experiment-dir> <baseline-experiment-dir>
- Experiments:
- Experiments can be defined using json, validated with
jsonschema
. See here for sample experiment configs. You can do a grid search if multiple parameters are specified, which will run in parallel. - Example:
run_yarlp_experiment --spec-file experiment_configs/trpo_experiment_mult_params.json
- Experiments can be defined using json, validated with
- Experiment plots:
make_plots <experiment-dir>