Closed
Description
Given the change in output shape/behavior in pytorch/pytorch#139611 + #1278
Question: What is the recommended way of migrating to the new cpu implementation of
- _weight_int4pack_mm_for_cpu
- _convert_weight_to_int4pack_for_cpu
while maintaining the previous behavior?
Specifically _convert_weight_to_int4pack
q, s, z = Q4_0.unpack(t)
scales_and_zeros = pack_scales_and_zeros(s, z)
q_uint8 = (q[::, ::2] << 4 | q[::, 1::2]).to(torch.uint8)
weight_int4pack = torch.ops.aten._convert_weight_to_int4pack(
q_uint8, inner_k_tiles
)
c = torch.ops.aten._weight_int4pack_mm(
input,
weight_int4pack,
groupsize,
scales_and_zeros,
)
Tested: With no code changes
The following error is encountered:
Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend
Tested: Naive (Just add *_for_cpu)
Size mismatch was encountered (expected since signatures are different)
size mismatch for model.layers.0.attention.wq.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([256, 16, 32, 4]).
cc: @yanbing-j @jerryzh168 who worked on the changes
Metadata
Metadata
Assignees
Labels
No labels