-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I couldn't get good result for GAIL in any environments except HalfCheetah. #204
Comments
For the moment, the easiest way to fix the problem is to change the reward function and turn normalization off: See the comments here: You need to use this reward specifically: rewards_B = -tensor.log(1.-tensor.nnet.sigmoid(scores_B)) |
This was very helpful to me. I figured out the standard deviation of reward from discriminator is much higher than that from mujoco simulators. I also understood that the reward range should be different depending on the episode end option. I finally got good results after modified the reward function. But I'm not sure why the value network can be trained without reward normalization. And I'm wondering that there is some reason why you normalize the reward from discriminator knowing the standard deviation of that reward is too high. I think clipping is more proper than normalization for the reward function in discriminator. Could you comment on these questions, please? Thanks! |
hi, I meet similar problem, my results is always bad in the GAIL. Can you share your experiences on this problem in detail? Thank you very much! |
Hi, first of all, thank you for sharing your code.
I've been trying to implement GAIL using expert demonstrations from your Google Drive. I used the hyper-parameters from gail_experts/readme and I got good result from HalfCheetah. But, I got bad result than I expected from others such as Hopper, Ant, Walker2d(I coudn't test for Reacher. I guess the expert data, which is only 240KB has some problem.) I tried again with different hyper-parameters including seed, but unfortunately still got the same result. So could you share the parameters you used for these environments I failed? It would help comparison test for my research a lot.
The text was updated successfully, but these errors were encountered: