Why can the weights of a zero-convolution layer obtain different gradients during training? #550
sieve-github-access
started this conversation in
General
Replies: 1 comment 2 replies
-
there is a link on the front page with an explanation, see here https://github.com/lllyasviel/ControlNet/blob/main/docs/faq.md 2h video from a guy explaining the paper https://www.youtube.com/watch?v=Mp-HMQcB_M4 |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
A common understanding in deep learning is that Initializing all the weights with zeros leads the neurons to learn the same features during training [1]. This is because all the weights get exactly the same gradient value and therefore receive exactly the same update during training. This is problematic because the layer effectively have only 1 neuron (aka channel) regardless how many neuron it has.
It seems to me that 1*1 zero-convolution layers, which are used throughout the paper, will suffer from this problem as well,.
That said, the huge success of ControlNet seems to imply that it doesn't suffer from this problem. Could anyone help me understand why?
Beta Was this translation helpful? Give feedback.
All reactions