Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option share_weight does not work #4

Open
tristan-ka opened this issue Dec 7, 2020 · 0 comments
Open

Option share_weight does not work #4

tristan-ka opened this issue Dec 7, 2020 · 0 comments

Comments

@tristan-ka
Copy link

Hi !

Thank you for sharing the code of your amazing paper.

I managed to run your code with the BabyAI environment in the setup where weights are not share between the value and the policy nets

I am now trying to run the experiment in the share_weight configuration but I run into an assertion error :

File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/models.py", line 234, in get_value assert self.share_weights, "Must be sharing weights to use get_value"
AssertionError: Must be sharing weights to use get_value

If I manually set the share_weight parameter of the policy model to True, I run into the following error:

Traceback (most recent call last):
  File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 255, in <module>
    main(params)
  File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 124, in main
    mean_reward = p.train_step()
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 473, in train_step
    surr_loss, val_loss = self.take_steps(saps)
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 436, in take_steps
    surr_loss = self.policy_step(*args).mean()
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 289, in ppo_step
    store, old_vs=batch_old_vs, opt_step=opt_step)
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 193, in value_step
    val_opt.zero_grad()
AttributeError: 'NoneType' object has no attribute 'zero_grad'

It looks like the Value Optimizer is set to None when you call value_step from inside ppo_step.

Moreover, when I look at the value_step function in the weight sharing mode, you are computing the value loss from the ppo_step on minibatches. However, it seems that when value_step is called from ppo_step you split the minibatches again into other minibatches, do the value loss computation on the first split and then return.

It seems that the value_step function is processing the data in the same way no matter the value of the share_weight parameter. This seems a bit strange as when share_weight is False, value_step operates on a whole trajectory whereas when share_weight is True, value_step takes only a batch as input.

Can you confirm that the sharing weight option is supported and works?

Thank you again for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant