BitNet_Llama_model_test_huggingface_GPU.ipynb 

I was testing in Colab and when I ran "model.model.layers[0].mlp.gate_proj.weight".  I recieved very different results from yours. You got:
Parameter containing:
tensor([[ 0.0032, -0.0339,  0.0150,  ...,  0.0041, -0.0048,  0.0061],
        [-0.0105, -0.0049, -0.0586,  ..., -0.0092,  0.0188, -0.0084],
        [-0.0383, -0.0109,  0.0031,  ..., -0.0410,  0.0211,  0.0223],
        ...,
        [ 0.0131, -0.0259,  0.0034,  ...,  0.0233, -0.0281, -0.0131],
        [ 0.0062,  0.0198,  0.0085,  ...,  0.0129, -0.0205,  0.0050],
        [ 0.0292,  0.0152, -0.0175,  ...,  0.0256,  0.0276,  0.0082]],
       device='cuda:0', dtype=torch.bfloat16, requires_grad=True)

I got:
tensor([[ 0.0007,  0.0007,  0.0007,  ...,  0.0007, -0.0007,  0.0007],
        [ 0.0007,  0.0007,  0.0007,  ...,  0.0007,  0.0007, -0.0007],
        [ 0.0007,  0.0007, -0.0007,  ..., -0.0007, -0.0007,  0.0007],
        ...,
        [ 0.0007, -0.0007,  0.0007,  ..., -0.0007, -0.0007,  0.0007],
        [ 0.0007,  0.0007,  0.0007,  ..., -0.0007, -0.0007, -0.0007],
        [ 0.0007, -0.0007,  0.0007,  ...,  0.0007,  0.0007,  0.0007]],
       device='cuda:0', dtype=torch.bfloat16)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BitNet_Llama_model_test_huggingface_GPU.ipynb #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

BitNet_Llama_model_test_huggingface_GPU.ipynb #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions