AWAC doesn't profit from offline data

Hi, 

@anair13 , it's nice that we can get the code, seems you answer AWAC questions frequently, so I just directly make "@" to you. 

In AWAC paper the main benifit is that switching from offline-training to online training there is no "dip" of the performance. But when I run it on mujoco-gym environment, it doesn't get benifit from the pre-training on offline dataset. 
- HalfCheetah, it  learns nothing , the episode returns are almost always below zero.
- Ant, it performs nearly expert performance after switching from offline to online, but it have a huge dip to nearly zero. 
- Walker2d, it also has a dip.

I run the code in repo `examples/awac/mujoco/awac1.py` with all default settings, seems pretraining on offline data doesn't help these experiments.  I find this link in issues(https://drive.google.com/file/d/1Qy5SYIGNwdeTHAGNjbRfuP5pSiRw8JzJ/view), looks in this file the leraning processs also doesn't profit much from the offline-learning.

Do I have to change any hyperparameter? If would be really super nice if I can reproduce the paper result.

Looking forward to your reply.

Best.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWAC doesn't profit from offline data #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AWAC doesn't profit from offline data #166

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions