- Implement prioritised experience replay properly
- Evolution algorithm
- Think about and start adding tests
- implement decayed learning rate
- Write TESTS
Later. Implement algorithms from Berkley Deep RL course
This (WIP) repository contains:
- PyTorch implementations of deep reinforcement learning algorithms
- Analysis of algorithm performance in different game environments
The Deep Q Learning algorithm was implemented including experience replay. Experience replay is when you store your experiences and then randomly sample from past experiences during training. The purpose of this is to decorrelate the data you are using to train your policy.
The Deep Q Learning algorithm above was extended to include Fixed Q-Targets. Using Fixed Q-Targets means that to calculate the Q-targets we use a different neural network than our policy network. The alternative neural network we use is effectively an older version of the policy. This generally has the effect of stabilising training.
Using Fixed Q-Targets with experience replay makes this algorithm similar to the algorithm used in DeepMind's 2013 Atari Paper
The Deep Q Learning algorithm was also extended to a Doulbe Deep Q Learning algorithm. This means that when calculating the Q-value resulting from the best action we use two different networks (one to select the maximum action and one to calculate the Q-value of that action) rather than one network. This helps alleviate over-estimation of the Q-values and is described in this paper.
Each agent plays each game 10 times and then the median result is shown below.
To train the agents on the Unity Banana environment and see their results:
- Navigate to Deep_RL_Implementations/
- On the command line run:
- pip install -r requirements.txt
- cd Results/Unity_Banana_Environment
- jupyter notebook
- Then open the notebook Unity_Banana_Environment_Results.ipynb and run all cells
- Results for the 3 agents are given below: