Change choose_qparams_per_token to choose_qparams_per_token_asymmetric (#61)

jerryzh168 · facebook-github-bot · commit 9c048eba460f · 2024-03-18T16:10:15.000-07:00
Summary: Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #61 This is needed for xnnpack, we can support other patterns later Pull Request resolved: #61 Test Plan: CI, will be tested when xnnpack lowering is ready Reviewed By: andrewor14 Differential Revision: D55031545 Pulled By: jerryzh168 fbshipit-source-id: 3908bf0e6e5638b611300b0a45cedabb3c0592b3
diff --git a/torchao/quantization/quant_primitives.py b/torchao/quantization/quant_primitives.py
@@ -1070,10 +1070,11 @@ def unpack_int4_to_int8(int8_data: torch.Tensor) -> torch.Tensor:
 
 def per_token_dynamic_quant(input: torch.Tensor) -> torch.Tensor:
     orig_dtype = input.dtype
+    # TODO: we may need to make the choose_qparams op configurable
     (
         scales,
         zero_points,
-    ) = torch.ops.quantized_decomposed.choose_qparams_per_token(input, torch.int8)
+    ) = torch.ops.quantized_decomposed.choose_qparams_per_token_asymmetric(input, torch.int8)
 
     # TODO: get these from torch.int8
     quant_min = -128