PPO for continuous env #4

zbenic · 2019-06-28T09:17:31Z

Hello.

Were you able to get >200 reward in Lunar Lander Continuous?
I'm currenty at ~40000 episode, but still the reward is max ~130.

I have no problems with discrete env, but do with continuous.
Can you give me some advice?

nikhilbarhate99 · 2019-06-29T12:24:59Z

No, the policy seems to get stuck in a local maxima for the continuous env.
You could try to tune the hyperparameters (action_std, K_epochs, update_timestep, lr)
or use a different advantage function.

I tried changing the activations to Tanh and use the hyperparameters used by other repos, but the results were not very good either.

I'll update the repo if I find good parameters.

nikhilbarhate99 closed this as completed Jul 1, 2019

This was referenced Jul 8, 2020

loss.mean().backward() crash #31

Closed

in cuda train error expected dtype Double but got dtype Float #33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO for continuous env #4

PPO for continuous env #4

zbenic commented Jun 28, 2019 •

edited

Loading

nikhilbarhate99 commented Jun 29, 2019

PPO for continuous env #4

PPO for continuous env #4

Comments

zbenic commented Jun 28, 2019 • edited Loading

nikhilbarhate99 commented Jun 29, 2019

zbenic commented Jun 28, 2019 •

edited

Loading