Move Int8DynamicActivationIntxWeightConfig out of experimental #1968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

metascroy merged 21 commits into main from add-torchao-module-constructors

Apr 8, 2025

Contributor

metascroy commented Mar 26, 2025 •

edited

Loading

This PR moves Int8DynamicActivationIntxWeightConfig and its quantizer into torchao.quantization.quant_api (out of experimental). Int8DynamicActivationIntxWeightConfig is refactored to closely mirror Int8DynamicActivationInt4WeightConfig when weight_dtype=torch.int4, layout=QDQLayout().

Quantization in Int8DynamicActivationIntxWeightConfig is done with QDQLayout, and then packing is done separately with make_packed_linear_int8_dynamic_activation_intx_weight_tensor. This is to separate the quantization algorithm from the storage. Both the packed and QDQLayout quantize with the same algorithm, and this is made explicit.

Example API usage:

quantize_(
      model,
      Int8DynamicActivationIntxWeightConfig(
          weight_dtype=torch.int4,
          weight_granularity=PerGroup(32),
          weight_mapping_type=MappingType.ASYMMETRIC,
          weight_zero_point_domain=ZeroPointDomain.NONE,
          layout=QDQLayout(),
      ),
  )

pytorch-bot bot commented Mar 26, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1968

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0763e90 with merge base 9516764 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

metascroy changed the title ~~Create utility to construct QuantizedLinear from plain data~~ Move Int8DynamicActivationIntxWeightConfig out of experimental

metascroy requested a review from jerryzh168

April 1, 2025 03:43

metascroy mentioned this pull request

[torchao] Move Int8DynamicActivationIntxWeightConfig out of torchao/experimental. pytorch/executorch#9516

Closed

metascroy force-pushed the add-torchao-module-constructors branch 2 times, most recently from c80e35e to fb1eca3 Compare

April 6, 2025 00:02

metascroy added 13 commits

April 5, 2025 21:23

up

cb78ae6

up

0027e16

up

4ede511

up

289c70e

up

a3d3fdc

up

e4373c8

up

4e6fca2

up

f7d7155

up

4200dbb

up

0a1d007

up

763af0d

up

fb7b2ba

up

90bc3aa

metascroy force-pushed the add-torchao-module-constructors branch from f63ca78 to 90bc3aa Compare

April 6, 2025 04:25

metascroy added 2 commits

April 6, 2025 10:52

up

4a421db

up

f432de6

metascroy commented

View reviewed changes

torchao/quantization/quant_api.py

    
              @dataclass

              class Int8DynamicActivationIntxWeightConfig(AOBaseConfig):

                  """

Contributor Author

metascroy Apr 7, 2025

@andrewor14 can you have a look at this comment if there are any issues with it working well with QAT workflow with FakeQuantizeConfig.

andrewor14 reviewed

View reviewed changes

Contributor

andrewor14 left a comment

Thanks @metascroy, looks great overall. I pointed out some fields that appear to be different/missing in the comments but I think the new config will work well with QAT. Either way we'll probably need an end-to-end QAT test to confirm that prepare vs convert numerics match exactly (can be future PR). Also left some questions about the new layout.

torchao/quantization/quant_api.py Outdated

    
                  """

                  weight_dtype: torch.dtype = torch.int8

                  weight_granularity: Union[PerRow, PerGroup] = PerGroup(32)

Contributor

andrewor14 Apr 7, 2025

I feel we should just make the type here Granularity and throw an error for unsupported types, so we don't tie ourselves to specific granularity in the signature itself

torchao/quantization/quant_api.py

    
                  if isinstance(weight_granularity, PerGroup):

                      group_size = weight_granularity.group_size

                  elif isinstance(weight_granularity, PerRow):

                      group_size = weight.shape[-1]

Contributor

andrewor14 Apr 7, 2025

As discussed offline, this seems more like per channel to me, which is expressed in terms of PerAxis. PerRow seems like an unrelated float8 thing according to the docstrings here:

ao/torchao/quantization/granularity.py

Line 74 in 5802d2d

class PerRow(Granularity):

torchao/quantization/quant_api.py

    
                  weight_granularity = config.weight_granularity

                  weight_zero_point_domain = config.weight_zero_point_domain

                  weight_mapping_type = config.weight_mapping_type

                  weight_scale_dtype = config.weight_scale_dtype

Contributor

andrewor14 Apr 7, 2025

Can you add a TODO for Int8DynamicActivationInt4WeightConfig to add scale dtype there as well?

torchao/quantization/quant_api.py

    
              @dataclass

              class Int8DynamicActivationIntxWeightConfig(AOBaseConfig):

Contributor

andrewor14 Apr 7, 2025

Does this match the numerics of Int8DynamicActivationInt4WeightConfig exactly if we choose weight_dtype = torch.int4? Is that a goal of this?

Contributor

andrewor14 Apr 7, 2025

Whether or not this is the goal maybe we should document this somewhere, either here or in Int8DynamicActivationInt4WeightConfig's docstring, because users may be confused about this

Contributor Author

metascroy Apr 7, 2025

It does match numerics exactly when weight_dtype = torch.int4

torchao/quantization/quant_api.py

    
                  weight_granularity: Union[PerRow, PerGroup] = PerGroup(32)

                  weight_zero_point_domain: ZeroPointDomain = ZeroPointDomain.NONE

                  weight_mapping_type: MappingType = MappingType.SYMMETRIC

                  weight_scale_dtype: Optional[torch.dtype] = None

Contributor

andrewor14 Apr 7, 2025

I notice there's no weight_zero_point_dtype here. Is this assuming weight will always be symmetric? FWIW in the corresponding QAT FakeQuantizeConfig we do have zero_point_precision as well as scale_precision

Contributor Author

metascroy Apr 7, 2025

Asymmetric is supported. weight_zero_point_dtype is set to torch.int8 if weight_zero_point_domain=ZeroPointDomain.INT, else it is None if weight_zero_point_domain=ZeroPointDomain.NONE.

torchao/quantization/quant_api.py Outdated

    
                  weight_mapping_type: MappingType = MappingType.SYMMETRIC

                  weight_scale_dtype: Optional[torch.dtype] = None

                  act_mapping_type: MappingType = MappingType.ASYMMETRIC

                  layout: Layout = PackedLinearInt8DynamicActivationIntxWeightLayout(

Contributor

andrewor14 Apr 7, 2025

What happens if we use this layout for cuda or other non-CPU backends? Are the numerics the same / is it still optimized? Does this also work with PlainLayout? Just wondering if this is the right default

Contributor Author

metascroy Apr 7, 2025

It works with QDQLayout, which subclasses PlainLayout(), but explicitly defines the linear impl.

Today this is done via a fallback path with PlainLayout() that @jerryzh168 mentioned might be removed.

We could make QDQLayout the default. PackedLinearInt8DynamicActivationIntxWeightLayout only works on CPU.

torchao/quantization/quant_api.py

    
                      preserve_zero=has_weight_zeros

                      or (weight_mapping_type == MappingType.SYMMETRIC),

                      zero_point_domain=weight_zero_point_domain,

                      _layout=QDQLayout(),

Contributor

andrewor14 Apr 7, 2025

Do we just ignore config.layout here? Or does that refer to activation layout only?

Contributor Author

metascroy Apr 7, 2025

See comment on line 775. The QDQLayout is used for quantization algorithm.

The packing for layout PackedLinearInt8DynamicActivationIntxWeightLayout is handled on block at 804.

metascroy added 3 commits

April 7, 2025 18:05

up

4550a38

up

cf18e2c

up

51423b8

metascroy mentioned this pull request

Adds Q/DQ layout support for embedding quantization with IntxWeightOnlyConfig #1972

Merged

Contributor Author

metascroy commented Apr 8, 2025

@andrewor14 @jerryzh168 any more concerns on this PR?

metascroy mentioned this pull request

Move config out of experimental #1954

Merged

jerryzh168 reviewed

View reviewed changes

torchao/dtypes/uintx/packed_linear_int8_dynamic_activation_intx_weight_layout.py

    
              class _AffineQuantizedTensor(AffineQuantizedTensor):

              def make_packed_linear_int8_dynamic_activation_intx_weight_tensor(

Contributor

jerryzh168 Apr 8, 2025 •

edited

Loading

not for this PR, but would it make sense to have a separate tensor subclass for this layout and it can inherit from AQT?

Contributor Author

metascroy Apr 8, 2025

Perhaps that would be OK, but I'm wondering how that intersects with the quantize_ config (which has layout). Note that this layout does not override anything in AQT, so it doesn't need to be a separate class.

I just wanted a way for users to construct this AQT from plain data, which is kind of useful if people want to use them outside of the quantize_ API. There is no easy, packaged up way for users to do that with AQT right now.

Contributor

jerryzh168 Apr 8, 2025

Perhaps that would be OK, but I'm wondering how that intersects with the quantize_ config (which has layout).

you can continue to use layout as an abstraction if you feel that is useful

Note that this layout does not override anything in AQT, so it doesn't need to be a separate class.

OK then seems to be OK to keep using AQT

I just wanted a way for users to construct this AQT from plain data, which is kind of useful if people want to use them outside of the quantize_ API. There is no easy, packaged up way for users to do that with AQT right now.

it seems that using default AQT constructor is fine for now? or do you feel we should add another constructor function?

Contributor Author

metascroy Apr 8, 2025

or do you feel we should add another constructor function

let's wait and see if others want that functionality. The current way to construct an AQT from plain would be something like:

layout = PackedLinearInt8DynamicActivationIntxWeightLayout(target=target)
tensor_impl = PackedLinearInt8DynamicActivationIntxWeightAQTTensorImpl.from_plain(
        int_data, scale, zero_point, layout, bias
)
aqt = AffineQuantizedTensor(
    tensor_impl,
    block_size=(1, group_size),
    shape=int_data.shape,
    quant_min=qmin,
    quant_max=qmax,
    zero_point_domain=ZeroPointDomain.INT
    if has_weight_zeros
    else ZeroPointDomain.NONE,
)

which is isn't the most intuitive IMO.

Contributor

jerryzh168 Apr 8, 2025

this actually looks reasonable to me, similar to torch.Tensor where people can construct a TensorImpl and pass it to Tensor constructor

jerryzh168 reviewed

View reviewed changes

torchao/experimental/tests/test_quant_passes.py Outdated

Comment on lines 32 to 33

    
                      for weight_dtype in [getattr(torch, f"int{i}") for i in range(1, 9)]:

                          for has_weight_zeros in [True, False]:

                              for has_bias in [True, False]:

                                  idx = len(layers)

                                  layer_to_weight_dtype[idx] = weight_dtype

                                  layer_to_has_weight_zeros[idx] = has_weight_zeros

                                  layers.append(torch.nn.Linear(64, 64, bias=has_bias))

                      activations = torch.randn(2, 1, 64, dtype=torch.float32)

                          for weight_mapping_type in [MappingType.ASYMMETRIC, MappingType.SYMMETRIC]:

Contributor

jerryzh168 Apr 8, 2025

you can use

ao/torchao/prototype/float8nocompile/benchmark/benchmark.py

Line 74 in 9516764

for algo, layer_size, input_shape, high_precision_dtype in itertools.product(

to reduce indentation

metascroy added 2 commits

April 8, 2025 13:58


          Merge branch 'main' into add-torchao-module-constructors

88bffa4

up

jerryzh168 reviewed

View reviewed changes

torchao/experimental/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 approved these changes

View reviewed changes

Contributor

jerryzh168 left a comment

LGTM

jerryzh168 reviewed

View reviewed changes

torchao/experimental/quant_api.py Show resolved Hide resolved

up

0763e90

jerryzh168 added the topic: new feature label

Contributor

jerryzh168 commented Apr 8, 2025

also labeling this as a new feature, might be helpful to write down how to use the API in summary so we can copy paste for release notes

metascroy merged commit 04d1186 into main

18 of 19 checks passed

Contributor

andrewor14 commented Apr 9, 2025

Looks great! @metascroy can you add a unit test in a separate PR showing this new config matches Int8DynamicActivationInt4WeightConfig numerics exactly if weight_dtype = torch.int4?

liangel-02 pushed a commit that referenced this pull request


          Move Int8DynamicActivationIntxWeightConfig out of experimental (#1968)

cabe416

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed topic: new feature