Enhancements:
- Agents
- Training helper functions
- Hook functions #85
- Add more columns to scores.txt: episodes, max and min #78
- Improve naming of the output directories #72 #77
- Use logger instead of print #60
- Make train_agent_async's eval_interval optional #93
- Misc
- Use Gumbel-Max trick for categorical sampling in GPU #88 #104
- Remove test arguments from links (use chainer.config instead) #100
Fixes:
- Fix argument names #86
- Fix option names #71
- Fix the issue that average_loss is not updated #95
Dependency changes:
- Switch to Chainer v2 #100
Changes that can affect performance:
- train_agent_async won't decay learning rate by default any more. Use hook functions instead.