Adding "Principled Weight Initialization for Hypernetworks" #1

OhadRubin · 2021-08-02T11:28:13Z

Hey,
Do you think it is possible to add the initialization from https://openreview.net/forum?id=H1lma24tPB to this model?
Thanks,
Ohad

rabeehk · 2021-08-09T12:11:24Z

Hi Ohad,
thanks for the link to the paper, I read through this, yes and this is very interesting direction, I am interested to contribute to this idea.
Best
Rabeeh

OhadRubin · 2021-08-09T17:21:56Z

Hey,
If I understand the example they posted here (and i'm not sure I do):

def hyperfanoutWi_init(i):
  def hyperfanout_init(Wi):
    fan_out, fan_in = Wi.size(0), Wi.size(1)
    bound = math.sqrt(3*2 / (fan_in * hardcoded_hyperfanout[i]) / hardcoded_receptive(i))
    Wi.uniform_(-bound, bound)
    return Wi
  return hyperfanout_init

I'm not sure how to implement this scheme for the LN, but for the feedforward network:
To implement the hyperfan-in init, your code initializing needs to be changed here:
For the FF adapter linear down layer bound = math.sqrt(3 / (self.task_embedding_dim * self.down_sample_size ))
For the FF adapter linear up layer bound = math.sqrt(3 / (self.task_embedding_dim * self.input_dim ))
For the FF adapter bias layer (for both up and down) bound = math.sqrt(3/(2*self.task_embedding_dim))
wdyt?

rabeehk · 2021-08-24T11:48:11Z

Hi @OhadRubin
Apologies for my delayed response, yes based on what I see in the codes you shared, you need to change the weight initialization for all the hypernetworks, For the line you mentioned, couldn't it be also replaced with the init you mentioned? Could you tell me please why for LN one cannot do it? thanks.

OhadRubin · 2021-08-24T11:50:53Z

I think for LN it won't work because it is multiplicative factor.

rabeehk · 2021-08-24T11:54:26Z

Hi @OhadRubin , do you mind mentioning the line?
For the line you mentioned, if one init linear_layer.weight to the scheme you said, wouldnt this work?

jianghaojun · 2022-09-28T06:48:01Z

@OhadRubin Hi, did you try to use the principle initialization method？And did this work？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding "Principled Weight Initialization for Hypernetworks" #1

Adding "Principled Weight Initialization for Hypernetworks" #1

OhadRubin commented Aug 2, 2021

rabeehk commented Aug 9, 2021

OhadRubin commented Aug 9, 2021 •

edited

Loading

rabeehk commented Aug 24, 2021

OhadRubin commented Aug 24, 2021

rabeehk commented Aug 24, 2021

jianghaojun commented Sep 28, 2022

Adding "Principled Weight Initialization for Hypernetworks" #1

Adding "Principled Weight Initialization for Hypernetworks" #1

Comments

OhadRubin commented Aug 2, 2021

rabeehk commented Aug 9, 2021

OhadRubin commented Aug 9, 2021 • edited Loading

rabeehk commented Aug 24, 2021

OhadRubin commented Aug 24, 2021

rabeehk commented Aug 24, 2021

jianghaojun commented Sep 28, 2022

OhadRubin commented Aug 9, 2021 •

edited

Loading