Closed
Description
This is very nice, thanks!
It would be useful to open an issue to discuss the need for the Linear
layer here. Hopefully we can make the builtins more flexible so this kind of thing is less necessary.
Originally posted by @MikeInnes in FluxML/model-zoo#115 (comment)
The primary need for making a new type Linear
, was the bias initializer only takes in the output dimension, which is intuitive but is problematic when considering some bias initialization rely on more than the output dimension. For example, the default nn.Linear
layer in PyTorch scales the initialization of the bias by the input dimension. Relevant code:
def reset_parameters(self):
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)