This is a GAIL baselines what belong to Inverse Reinforcement Learning (IRL) methods.
As we all know GAN and GAIL are fragile, even the baseline code what is written by OpenAI is hard to train. Therefore, I write a GAIL code which is PyTorch edition. Besides, Because of the fragility of GAIL, I add some trick in code, what is inevitable, and the tricks are as flows:
- mujoco-py==2.0.2.13
- PyTorch==1.7.1
- See more details in requirement.txt
- Memory: add a replay buffer to train generator,
- Batch Normal: using batch normal trick to transform state , action and next state , note: this trick is used for train generator net ,instead of discriminator net.
- Reward Function: if generator accuracy less than 0.5, then this indicates that the generator can not identify the generated data and exert data, thus the reward is optimal reward. Conversely the reward equals to reward function generated by discriminator.
- Add noise : add noise to discriminator
- The key to train GAIL is that balancing the discriminator and generator performance, a strong discriminator is not allowed, the discriminator should waiting for the generator.
python main.py --env_name=Hopper-v2
note: By this way, you can only change the ==environment name==, the other parameters only can be changed in their ==yaml file==, the file path is =="./env_parser/"==.
- Hopper-v2 (expert return = 3500)
-
Ant-v2 (expert return =5500 )
-
Walker2d-v2 (expert return = 4900)
This package can be used to generate expert demonstrations.
You can also download expert demonstration via link: Expert Demonstration
[SAC(pytorch-soft-actor-critic-master)]: https://github.com/pranz24/pytorch-soft-actor-critic
The websites of Four GAIL editions are as flows:
[gail-pytorch]:https://github.com/hcnoh/gail-pytorch.git
[PyTorch-RL]:https://github.com/Khrylx/PyTorch-RL.git
[imitation]:https://github.com/openai/imitation.git