This repo replicates the results Hester et al. obtained:
Deep Q-Learning from Demonstraitions
This repo is based on the fantastic repo from Morikatron/DQfD
This code is based on code from OpenAI baselines. The original code and related paper from OpenAI can be found here.
The algorithms, hyperparameters, etc. are based on the paper as much as possible.
Visit https://tech.morikatron.ai/entry/2020/04/15/100000 for the Morikatron's great blog post about this algorithm.
This algorithm is much faster than normal deep Q-learning and use very little demo data (5 to 10 episodes). After 1.5 million frames, it reached a mean score of 30 on Atari Breakout. Meanwhile, double DQN with prioritized experience replay takes 20 million frames.
One of its best episode after 1.5 million steps:
It dug a tunnel through the bricks, which I never did in my demo.
See here for the full episode.
Required libraries
- Tensorflow 2(tensorflow-gpu when using GPU)
- gym
- tqdm
- dill
If you don't use GPU, replace
"with tf.device('/GPU:0'):
in dqfd.py with
with tf.device('/CPU:0'):"
Clone repo:
git clone https://github.com/Kokkini/DQfD.git
Create and active virtual environment
conda create -n DQfDenv
conda activate DQfDenv
Install required libraries
pip install tensorflow-2.0
(pip install tensorflow-gpu)
pip install gym
pip install tqdm
pip install dill
First, run make_demo.py to create a demo. Your demo will be saved in the ./data/demo directory.
python make_demo.py --env=BreakoutNoFrameskip-v4
- w,s,a,d:move
- SPACE: jump
- Plus (+) on numpad: increase game speed
- Minus (-) on numpad: decrease game speed
- Each episode will be automatically saved when they end (done=True)
- backspace:reset current episode without saving
- enter: save current episode and begin another episode (use this when you want to save the episode without waiting until the end of it)
- esc:end the collection of demo episodes (the current episode will not be saved)
After collecting demo episodes, run run_atari.py to start learning:
python run_atari.py --pre_train_timesteps=1e5 --num_timesteps=4e6
If you don't want to create your own demo data, you can download the following demo data.
My demo data for 7 episodes of Breakout (my max score is 30):
https://drive.google.com/file/d/15pXp-kwY_wFn2Eq6XRZkgQxdXLvZwcNn/view?usp=sharing
Place the pkl file of the link in the DQfD/data/demo directory. You can now start training without collecting your own demo episodes.
When OMP: Error appears on MacOS, add the following to the head of dqfd.py
import os
os.environ['KMP_DUPLICATE_LIB_OK']='TRUE'