Description
Hi,
@anair13 , it's nice that we can get the code, seems you answer AWAC questions frequently, so I just directly make "@" to you.
In AWAC paper the main benifit is that switching from offline-training to online training there is no "dip" of the performance. But when I run it on mujoco-gym environment, it doesn't get benifit from the pre-training on offline dataset.
- HalfCheetah, it learns nothing , the episode returns are almost always below zero.
- Ant, it performs nearly expert performance after switching from offline to online, but it have a huge dip to nearly zero.
- Walker2d, it also has a dip.
I run the code in repo examples/awac/mujoco/awac1.py
with all default settings, seems pretraining on offline data doesn't help these experiments. I find this link in issues(https://drive.google.com/file/d/1Qy5SYIGNwdeTHAGNjbRfuP5pSiRw8JzJ/view), looks in this file the leraning processs also doesn't profit much from the offline-learning.
Do I have to change any hyperparameter? If would be really super nice if I can reproduce the paper result.
Looking forward to your reply.
Best.
Activity