You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No, the policy seems to get stuck in a local maxima for the continuous env.
You could try to tune the hyperparameters (action_std, K_epochs, update_timestep, lr)
or use a different advantage function.
I tried changing the activations to Tanh and use the hyperparameters used by other repos, but the results were not very good either.
Hello.
Were you able to get >200 reward in Lunar Lander Continuous?
I'm currenty at ~40000 episode, but still the reward is max ~130.
I have no problems with discrete env, but do with continuous.
Can you give me some advice?
The text was updated successfully, but these errors were encountered: