Closed
Description
https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md
Basic environment description mentions +0.1 and +1 rewards, but is missing -0.01 rewards.
Agent Reward Function:
+0.1 for arriving at suboptimal state.
+1.0 for arriving at optimal state.
[-0.01]
[-0.01]
[-0.01]
[0.99]
[0.]
Benchmark Mean Reward: 0.94 but 0.930 is maximum.