[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Xia-Weiwen · 2025-09-26T03:18:08Z

Summary
We split the original big PR #2505 into the following smaller ones:

Unify get_block_size #3039 (relanded by [Reland] Unify get_block_size #3059)
[CPU] Add ops for float8 linear #3052
And this PR [CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075, which as the Float8OpaqueTensor for dynamic float8 act float8 weight quantization on CPU

Test plan

pytest -sv test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

pytorch-bot · 2025-09-26T03:18:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c7524ea with merge base 1a9b6f4 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh) (trunk failure)
test/test_low_bit_optim.py::TestOptim::test_param_groups_optim_name_AdamFp8_device_cpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen · 2025-09-26T03:19:01Z

CC @mingfeima for review. Thanks.

Xia-Weiwen · 2025-09-28T01:16:59Z

Hi @mingfeima @jerryzh168 @andrewor14 Could you please review this PR? Thanks.

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

Xia-Weiwen · 2025-09-30T01:38:23Z

Hi @mingfeima @jerryzh168 @andrewor14 Though this PR depends on #3100, could you please review this PR? Thanks.

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

torchao/float8/inference.py

torchao/float8/types.py

torchao/quantization/quant_api.py

Xia-Weiwen · 2025-10-14T02:04:04Z

@jerryzh168 Could you please review this PR again? Thanks.

torchao/float8/inference.py

Xia-Weiwen · 2025-11-11T03:04:36Z

Hi @jerryzh168 It's been awhile since last update of this PR because of reversions of prior PRs. I have rebased this PR. And for the _normalize_granularities thing we discussed above (#3075 (comment)), I have added a method _normalize_and_check_granularities to Float8OpaqueTensor. This brings minimal changes and keeps the code clean. Could you please review this PR again? Thanks.

jerryzh168 · 2025-11-11T18:19:04Z

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

+            )
+
+
+common_utils.instantiate_parametrized_tests(TestFloat8OpaqueTensor)


nit: seems that we can add this to TestFloat8OpaqueTensor class as a decorator

Updated. Thanks.

jerryzh168 · 2025-11-11T18:20:08Z

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

+    @common_utils.parametrize("x_dim", [2, 3])
+    @common_utils.parametrize("bias", [True, False])
+    @common_utils.parametrize("bs", [4, 128])
+    def test_dynamic_float8_linear_ref(self, dtype, x_dim, bias, bs):


what does ref mean? what's the difference between this test and previous one?

It tests the fallback path in the kernel. I have updated the name and added a comment to explain this. Thanks.

jerryzh168 · 2025-11-11T18:20:35Z

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

+            example_inputs = (example_inputs[0].unsqueeze(0),)
+        y = m(*example_inputs)
+
+        with torch.no_grad():


can do 8e3b3da in setUp

Updated. Thanks.

jerryzh168 · 2025-11-11T18:21:05Z

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

+    )
+
+
+class ToyLinearModel(torch.nn.Module):


we also just landed a util for this b4ec4cb

Updated. Thanks.

jerryzh168 · 2025-11-11T18:21:59Z

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

+
+from torchao import quantize_
+from torchao.quantization import PerGroup, PerRow, PerTensor
+from torchao.quantization.quant_api import (


nit: we can just import from torchao.quantization, I feel in the end we might be able to make quant_api.py as implementation detail and don't expose to users

Updated. Thanks.

jerryzh168 · 2025-11-11T18:23:16Z

torchao/quantization/quant_api.py

            return weight

-    elif not _fp8_mm_compat(weight):
+    elif not is_cpu and not _fp8_mm_compat(weight):


maybe check packing_format instead of device? since we are trying to make this not device specific

Updated. Thanks.

jerryzh168 · 2025-11-11T18:23:22Z

torchao/quantization/quant_api.py

        return weight

-    if isinstance(weight_granularity, PerRow):
+    if not is_cpu and isinstance(weight_granularity, PerRow):


Updated. Thanks.

jerryzh168 · 2025-11-11T18:24:07Z

torchao/quantization/quant_api.py

            "Config Deprecation: version 1 of Float8DynamicActivationFloat8WeightConfig is deprecated and will no longer be supported in a future release, please use version 2, see https://github.com/pytorch/ao/issues/2649 for more details"
        )

+        _check_hardware_support(granularity)


this function name seems too general, but we can improve this later

Yeah. This function is not added by this PR. I just moved it here for config.version == 1.

jerryzh168 · 2025-11-11T18:25:08Z

torchao/quantization/quant_api.py

    *,
    parameter_name: str = "weight",
 ):
-    assert is_sm_at_least_89() or is_MI300(), (


why is this removed? is it because it's already checked in _check_hardware_support?

It's probably due to rebase conflicts. I have added this back. Thanks.

jerryzh168 · 2025-11-11T18:25:33Z

torchao/quantization/quant_api.py

-    assert is_sm_at_least_89() or is_MI300(), (
-        "Float8 dynamic activation quantization is only supported on CUDA>=8.9 and MI300+"
-    )
-    if config.set_inductor_config:


also why is this removed?

It's probably due to rebase conflicts. I have added this back. Thanks.

jerryzh168 · 2025-11-11T18:26:30Z

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

+    PerRow,
+    PerTensor,
+)
+from torchao.quantization.observer import get_block_size


why importing from observer? I thought it's moved to torchao.quantization.utils?

Updated. Thanks.

jerryzh168 · 2025-11-11T18:33:19Z

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

+        f"Shapes of input and weight do not match, input:{input_tensor.shape}, weight: {weight_tensor.shape}"
+    )
+
+    act_mat = input_tensor.contiguous()


isn't this going to be slow?

On CPU, we require input tensors to be contiguous. In fact, we almost always get contiguous inputs. So, the reordering won't actually happen. Here it just ensures the assumption.

jerryzh168 · 2025-11-11T18:34:12Z

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

+        granularity = weight_tensor.act_quant_kwargs.granularity
+        if isinstance(granularity, PerGroup):
+            group_size = granularity.group_size
+            if weight_tensor.block_size[1] < weight_tensor.size(-1):
+                # weight_tensor is also per group quantized
+                assert weight_tensor.block_size[1] == group_size, (
+                    "input and weight should have the same group size but got"
+                    f" {weight_tensor.block_size[1]} and {group_size}"
+                )
+        act_block_size = get_block_size(act_mat.shape, granularity)
+        act_scale = _choose_scale_float8(
+            act_mat,
+            float8_dtype=torch.float8_e4m3fn,
+            block_size=act_block_size,
+        )
+        act_mat = _quantize_affine_float8(act_mat, act_scale, torch.float8_e4m3fn)


why is this not using

ao/torchao/quantization/quantize_/workflows/float8/float8_tensor.py

Line 311 in b4ec4cb

input_tensor = _choose_quant_func_and_quantize_tensor(

Thanks for the pointer. However, _choose_quant_func_and_quantize_tensor does the following:

if isinstance(quant_kwargs, QuantizeTensorToFloat8Kwargs): return Float8Tensor.from_hp(...)

Unfortunately, Float8OpaqueTensor also uses QuantizeTensorToFloat8Kwargs so it cannot distinguish them.
Besides, in the implementation of Float8Tensor, activation is quantized by Float8Tensor.from_hp to a Float8Tensor and then unwrapped to get the quantized tensor data for computation. And this part of logic is not exposed to users. So, I feel that it's unnecessary to use Float8OpaqueTensor.from_hp to quantize the activation then unwrap it. It looks good to quantize it with _quantize_affine_float8.
What do you think? If you want Float8OpaqueTensor to be aligned with Float8Tensor, we may need to define a counterpart of QuantizeTensorToFloat8Kwargs for Float8OpaqueTensor so that we can distinguish them. Thanks.

jerryzh168 · 2025-11-11T18:35:10Z

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

+        packed_weight,
+        scale,
+        bias.float() if bias is not None else bias,  # requires bias to be float
+        torch.float,  # out_dtype


shouldn't this align with the original activation dtype orig_dtype? oh or are you trying to do this for better precision?

Updated. Thanks.

jerryzh168 · 2025-11-11T18:36:49Z

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

+            assert K % block_size[1] == 0, (
+                f"Expecting in_features {K} to be multiple of group_size {block_size[1]}, but got {K}"
+            )
+        scale = _choose_scale_float8(


I recently found #3324, does it affect your per tensor use case?

Thanks for the reminder. I didn't meet any issue with this.

Xia-Weiwen · 2025-11-13T03:08:49Z

Hi @jerryzh168 I have updated this PR per your comments. Could you please review again? Thanks.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025

Xia-Weiwen added the topic: new feature Use this tag if this PR adds a new feature label Sep 26, 2025

Xia-Weiwen requested review from andrewor14 and jerryzh168 September 26, 2025 06:10

Xia-Weiwen marked this pull request as ready for review September 26, 2025 06:10

Xia-Weiwen mentioned this pull request Sep 26, 2025

[CPU] Add Float8OpaqueTensor for dynamic float8 act float8 weight #2505

Closed

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight

d460134

mingfeima reviewed Sep 28, 2025

View reviewed changes

Xia-Weiwen requested a review from mingfeima September 28, 2025 02:08

Xia-Weiwen marked this pull request as draft September 30, 2025 01:28

Xia-Weiwen marked this pull request as ready for review September 30, 2025 01:35

jerryzh168 reviewed Oct 6, 2025

View reviewed changes

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py Show resolved Hide resolved

jerryzh168 reviewed Oct 6, 2025

View reviewed changes

torchao/float8/inference.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Oct 6, 2025

View reviewed changes

torchao/float8/types.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Oct 6, 2025

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Oct 6, 2025

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Oct 6, 2025

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

Xia-Weiwen added 3 commits October 9, 2025 10:00

Update _normalize_granularity

cf8dc09

Update torchao/quantization/quant_api.py

4333727

Fix CI

6e1c2a2

Xia-Weiwen requested a review from jerryzh168 October 14, 2025 01:59

jerryzh168 reviewed Oct 14, 2025

View reviewed changes

torchao/float8/inference.py Outdated Show resolved Hide resolved

Xia-Weiwen added 3 commits October 14, 2025 09:50

Merge branch 'main' into float8_opaque_tensor_new

7980de8

Merge branch 'main' into float8_opaque_tensor_new

ecf5e1a

remove unnecessary changes

1044dca

Xia-Weiwen requested a review from jerryzh168 November 11, 2025 03:00

Merge branch 'main' into float8_opaque_tensor_new

8044d4a

jerryzh168 reviewed Nov 11, 2025

View reviewed changes

Xia-Weiwen requested a review from jerryzh168 November 12, 2025 09:50

Xia-Weiwen added 3 commits November 12, 2025 16:10

Merge branch 'main' into float8_opaque_tensor_new

b1e715f

Refine code

52906df

Refine code

c7524ea

		)


		common_utils.instantiate_parametrized_tests(TestFloat8OpaqueTensor)

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Are you sure you want to change the base?

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Uh oh!

Conversation

Xia-Weiwen commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Xia-Weiwen commented Sep 26, 2025

Uh oh!

Xia-Weiwen commented Sep 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Oct 14, 2025

Uh oh!

Uh oh!

Xia-Weiwen commented Nov 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented Sep 26, 2025 •

edited

Loading

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading

jerryzh168 Nov 11, 2025 •

edited

Loading