When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time?

Hi,

When training the high-level policy in skimo_agent.py, z_next_pred is initialized as the first observation(line 616) and it is not updated at all after that.
Assuming from the comment and the paper, it seems like there should be a function call for hl_agent.model.imagine_step to update z_next_pred to the next imagine step. However, there is no such function call.
Is it a bug? or am I missing something?

Also, the code seems to suggest using the 'encoded ground-truth state' for the task policy when calculating the skill_prior_loss. But, in paper (Ep 7). it uses the imagined state to calculate the skill_prior_loss. I would like to know the logistics behind, why to use imagine step for the actor loss and why to use ground-truth state for the prior loss

Thank you!
<img width="963" alt="image" src="https://github.com/clvrai/skimo/assets/7140144/0fd9d3c1-cae7-4917-b41b-cfbb432c881a">


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time? #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time? #6

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions