An implementation of the Normalized Advantage Function Reinforcement Learning Algorithm with Prioritized Experience Replay
- The original paper of this code is: https://arxiv.org/abs/1603.00748
- The code is mainly based on: https://github.com/carpedm20/NAF-tensorflow/
- Additionally I added the prioritized experience replay: https://arxiv.org/abs/1511.05952
- Using the OpenAI baseline implementation: https://github.com/openai/baselines/blob/master/baselines/deepq/replay_buffer.py
Thanks openAI and Kim!
- Normalize the state and action space as well as the reward is a good practice
- Visualise as much as possible to get an intuition about the method as possible bugs
- If it does not make sense it is a bug with very high probability
Coding makes happy 🙃