You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for sharing the code of your amazing paper.
I managed to run your code with the BabyAI environment in the setup where weights are not share between the value and the policy nets
I am now trying to run the experiment in the share_weight configuration but I run into an assertion error :
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/models.py", line 234, in get_value assert self.share_weights, "Must be sharing weights to use get_value"
AssertionError: Must be sharing weights to use get_value
If I manually set the share_weight parameter of the policy model to True, I run into the following error:
Traceback (most recent call last):
File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 255, in <module>
main(params)
File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 124, in main
mean_reward = p.train_step()
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 473, in train_step
surr_loss, val_loss = self.take_steps(saps)
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 436, in take_steps
surr_loss = self.policy_step(*args).mean()
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 289, in ppo_step
store, old_vs=batch_old_vs, opt_step=opt_step)
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 193, in value_step
val_opt.zero_grad()
AttributeError: 'NoneType' object has no attribute 'zero_grad'
It looks like the Value Optimizer is set to None when you call value_step from inside ppo_step.
Moreover, when I look at the value_step function in the weight sharing mode, you are computing the value loss from the ppo_step on minibatches. However, it seems that when value_step is called from ppo_step you split the minibatches again into other minibatches, do the value loss computation on the first split and then return.
It seems that the value_step function is processing the data in the same way no matter the value of the share_weight parameter. This seems a bit strange as when share_weight is False, value_step operates on a whole trajectory whereas when share_weight is True, value_step takes only a batch as input.
Can you confirm that the sharing weight option is supported and works?
Thank you again for sharing!
The text was updated successfully, but these errors were encountered:
Hi !
Thank you for sharing the code of your amazing paper.
I managed to run your code with the BabyAI environment in the setup where weights are not share between the value and the policy nets
I am now trying to run the experiment in the share_weight configuration but I run into an assertion error :
If I manually set the share_weight parameter of the policy model to True, I run into the following error:
It looks like the Value Optimizer is set to None when you call value_step from inside ppo_step.
Moreover, when I look at the value_step function in the weight sharing mode, you are computing the value loss from the ppo_step on minibatches. However, it seems that when value_step is called from ppo_step you split the minibatches again into other minibatches, do the value loss computation on the first split and then return.
It seems that the value_step function is processing the data in the same way no matter the value of the share_weight parameter. This seems a bit strange as when share_weight is False, value_step operates on a whole trajectory whereas when share_weight is True, value_step takes only a batch as input.
Can you confirm that the sharing weight option is supported and works?
Thank you again for sharing!
The text was updated successfully, but these errors were encountered: