Description
Hi,
When training the high-level policy in skimo_agent.py, z_next_pred is initialized as the first observation(line 616) and it is not updated at all after that.
Assuming from the comment and the paper, it seems like there should be a function call for hl_agent.model.imagine_step to update z_next_pred to the next imagine step. However, there is no such function call.
Is it a bug? or am I missing something?
Also, the code seems to suggest using the 'encoded ground-truth state' for the task policy when calculating the skill_prior_loss. But, in paper (Ep 7). it uses the imagined state to calculate the skill_prior_loss. I would like to know the logistics behind, why to use imagine step for the actor loss and why to use ground-truth state for the prior loss
Activity