A jax/stax implementation of the Nature paper: Human-level control through deep reinforcement learning [1]
The agent at qdn.agent.py
implements the bsuite.baseline.base.Agent
interface.
The dqn//train.py
interfaces with a dm_env.Environment
.
We wrap the gym-atari suite using the bsuite.utils.gym_wrapper.DMEnvFromGym
adapter into a dqn.AtariEnv
to implement historical observations and actions repeat.
Implementation status of some of the techniques used in the paper:
- Experience replay [2]
- Target network [1]
- Reward clipping [1]
- Linear ε annealing [6]
- Frame skipping [5]
- Bellman error clipping [1]
- Consecutive no-ops prevention [1]
To run the algorithm on a GPU, I suggest to install the gpu version of jax
[4]. You can then install this repo using Anaconda python and pip.
conda env create -n dqn
conda activate dqn
pip install git+https://github.com/epignatelli/human-level-control-through-deep-reinforcement-learning