Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding "Principled Weight Initialization for Hypernetworks" #1

Open
OhadRubin opened this issue Aug 2, 2021 · 6 comments
Open

Adding "Principled Weight Initialization for Hypernetworks" #1

OhadRubin opened this issue Aug 2, 2021 · 6 comments

Comments

@OhadRubin
Copy link

Hey,
Do you think it is possible to add the initialization from https://openreview.net/forum?id=H1lma24tPB to this model?
Thanks,
Ohad

@rabeehk
Copy link
Owner

rabeehk commented Aug 9, 2021

Hi Ohad,
thanks for the link to the paper, I read through this, yes and this is very interesting direction, I am interested to contribute to this idea.
Best
Rabeeh

@OhadRubin
Copy link
Author

OhadRubin commented Aug 9, 2021

Hey,
If I understand the example they posted here (and i'm not sure I do):

def hyperfanoutWi_init(i):
  def hyperfanout_init(Wi):
    fan_out, fan_in = Wi.size(0), Wi.size(1)
    bound = math.sqrt(3*2 / (fan_in * hardcoded_hyperfanout[i]) / hardcoded_receptive(i))
    Wi.uniform_(-bound, bound)
    return Wi
  return hyperfanout_init

I'm not sure how to implement this scheme for the LN, but for the feedforward network:
To implement the hyperfan-in init, your code initializing needs to be changed here:
For the FF adapter linear down layer bound = math.sqrt(3 / (self.task_embedding_dim * self.down_sample_size ))
For the FF adapter linear up layer bound = math.sqrt(3 / (self.task_embedding_dim * self.input_dim ))
For the FF adapter bias layer (for both up and down) bound = math.sqrt(3/(2*self.task_embedding_dim))
wdyt?

@rabeehk
Copy link
Owner

rabeehk commented Aug 24, 2021

Hi @OhadRubin
Apologies for my delayed response, yes based on what I see in the codes you shared, you need to change the weight initialization for all the hypernetworks, For the line you mentioned, couldn't it be also replaced with the init you mentioned? Could you tell me please why for LN one cannot do it? thanks.

@OhadRubin
Copy link
Author

I think for LN it won't work because it is multiplicative factor.

@rabeehk
Copy link
Owner

rabeehk commented Aug 24, 2021

Hi @OhadRubin , do you mind mentioning the line?
For the line you mentioned, if one init linear_layer.weight to the scheme you said, wouldnt this work?

@jianghaojun
Copy link

@OhadRubin Hi, did you try to use the principle initialization method?And did this work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants