-
Notifications
You must be signed in to change notification settings - Fork 34
Description
I was testing in Colab and when I ran "model.model.layers[0].mlp.gate_proj.weight". I recieved very different results from yours. You got:
Parameter containing:
tensor([[ 0.0032, -0.0339, 0.0150, ..., 0.0041, -0.0048, 0.0061],
[-0.0105, -0.0049, -0.0586, ..., -0.0092, 0.0188, -0.0084],
[-0.0383, -0.0109, 0.0031, ..., -0.0410, 0.0211, 0.0223],
...,
[ 0.0131, -0.0259, 0.0034, ..., 0.0233, -0.0281, -0.0131],
[ 0.0062, 0.0198, 0.0085, ..., 0.0129, -0.0205, 0.0050],
[ 0.0292, 0.0152, -0.0175, ..., 0.0256, 0.0276, 0.0082]],
device='cuda:0', dtype=torch.bfloat16, requires_grad=True)
I got:
tensor([[ 0.0007, 0.0007, 0.0007, ..., 0.0007, -0.0007, 0.0007],
[ 0.0007, 0.0007, 0.0007, ..., 0.0007, 0.0007, -0.0007],
[ 0.0007, 0.0007, -0.0007, ..., -0.0007, -0.0007, 0.0007],
...,
[ 0.0007, -0.0007, 0.0007, ..., -0.0007, -0.0007, 0.0007],
[ 0.0007, 0.0007, 0.0007, ..., -0.0007, -0.0007, -0.0007],
[ 0.0007, -0.0007, 0.0007, ..., 0.0007, 0.0007, 0.0007]],
device='cuda:0', dtype=torch.bfloat16)