Skip to content

Working around new int4wo weight packing #1389

Closed
pytorch/torchchat
#1404
@Jack-Khuu

Description

@Jack-Khuu

Given the change in output shape/behavior in pytorch/pytorch#139611 + #1278

Question: What is the recommended way of migrating to the new cpu implementation of

  • _weight_int4pack_mm_for_cpu
  • _convert_weight_to_int4pack_for_cpu

while maintaining the previous behavior?


Specifically _convert_weight_to_int4pack

        q, s, z = Q4_0.unpack(t)
        scales_and_zeros = pack_scales_and_zeros(s, z)
        q_uint8 = (q[::, ::2] << 4 | q[::, 1::2]).to(torch.uint8)
        weight_int4pack = torch.ops.aten._convert_weight_to_int4pack(
            q_uint8, inner_k_tiles
        )

and _weight_int4pack_mm

        c = torch.ops.aten._weight_int4pack_mm(
            input,
            weight_int4pack,
            groupsize,
            scales_and_zeros,
        )

Tested: With no code changes

The following error is encountered:

Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend

Tested: Naive (Just add *_for_cpu)

Size mismatch was encountered (expected since signatures are different)

size mismatch for model.layers.0.attention.wq.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([256, 16, 32, 4]).

cc: @yanbing-j @jerryzh168 who worked on the changes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions