Option share_weight does not work #4

tristan-ka · 2020-12-07T17:33:15Z

Hi !

Thank you for sharing the code of your amazing paper.

I managed to run your code with the BabyAI environment in the setup where weights are not share between the value and the policy nets

I am now trying to run the experiment in the share_weight configuration but I run into an assertion error :

File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/models.py", line 234, in get_value assert self.share_weights, "Must be sharing weights to use get_value"
AssertionError: Must be sharing weights to use get_value

If I manually set the share_weight parameter of the policy model to True, I run into the following error:

Traceback (most recent call last):
  File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 255, in <module>
    main(params)
  File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 124, in main
    mean_reward = p.train_step()
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 473, in train_step
    surr_loss, val_loss = self.take_steps(saps)
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 436, in take_steps
    surr_loss = self.policy_step(*args).mean()
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 289, in ppo_step
    store, old_vs=batch_old_vs, opt_step=opt_step)
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 193, in value_step
    val_opt.zero_grad()
AttributeError: 'NoneType' object has no attribute 'zero_grad'

It looks like the Value Optimizer is set to None when you call value_step from inside ppo_step.

Moreover, when I look at the value_step function in the weight sharing mode, you are computing the value loss from the ppo_step on minibatches. However, it seems that when value_step is called from ppo_step you split the minibatches again into other minibatches, do the value loss computation on the first split and then return.

It seems that the value_step function is processing the data in the same way no matter the value of the share_weight parameter. This seems a bit strange as when share_weight is False, value_step operates on a whole trajectory whereas when share_weight is True, value_step takes only a batch as input.

Can you confirm that the sharing weight option is supported and works?

Thank you again for sharing!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option share_weight does not work #4

Option share_weight does not work #4

tristan-ka commented Dec 7, 2020

Option share_weight does not work #4

Option share_weight does not work #4

Comments

tristan-ka commented Dec 7, 2020