-
Notifications
You must be signed in to change notification settings - Fork 306
Closed
Description
Right now what we have is docstrings but they could use work - this came up as @vayuda was looking at extending his bitpacking work to include a notion of scales
- What does tensor core layout mean? It's not a googlable term and it seems to mean put into a format that tinygemm can understand
torch.ops.aten._weight_int4pack_mm(input_tensor.contiguous(), packed_weight, groupsize, scale_and_zero)
- It's kind of
unclear why scale_and_zero
are a single tensor innerKtiles
is never defined- The API does not describe how it wants to be used
@register_aqt_layout_cls("tensor_core_tiled")
class TensorCoreTiledAQTLayout(AQTLayout):
"""
Layout storage class for tensor_core_tiled layout for affine quantized tensor, this is for int4 only,
it stores the original tensor of dimension [n][k] (int32 dtype) as packed weight of 4-d tensor of
dimension: [n / 8][k / (InnerKTiles * 16)][32][innerKTiles / 2]
TODO: innerKTiles is hardcoded as 8 currently, we'll make this an argument later after decided
on the API
fields:
packed_weight (torch.Tensor): the 4-d packed tensor in a tensor_core_tiled layout
scale_and_zero (torch.Tensor): the combined scale Tensor used to map between floating point tensor to quantized tensor and zero_point Tensor
"""
Metadata
Metadata
Assignees
Labels
No labels