【Hackathon 9th No.71】add test_weight_only.py #4109

Echo-Nie · 2025-09-15T09:34:35Z

add test_weight_only.py

Input shape: [2, 16]
Weight int8 shape: [16, 16]
Weight scale shape: [16]
Bias shape: [16]
Int32 accum stats: min=-430.19, max=523.19, mean=63.72
Scaled stats (after weight_scale): min=-7.14, max=9.53, mean=1.14
Final output (after bias) stats: min=-8.45, max=8.66, mean=1.37


=== Reference float output ===
[[ 2.6108642 -0.6497458  7.0613937  2.0976558  8.218029   4.234454
   2.2585444  0.29855    2.5939693 -3.258247   4.2500834  1.8264745
   2.0854921 -3.9599     8.660601   2.673855 ]
 [ 3.8692675  2.6785803 -2.6663718  5.2733607 -8.454243   1.9710996
  -2.3115897  3.712538  -0.7427913 -0.841658  -0.1247133  3.0144439
  -2.3901563  5.688132  -1.8924255 -3.824362 ]]
=== Quantized output ===
[[ 2.611  -0.65    7.062   2.098   8.22    4.234   2.258   0.2986  2.594
  -3.258   4.25    1.826   2.086  -3.959   8.664   2.674 ]
 [ 3.87    2.678  -2.666   5.273  -8.45    1.971  -2.312   3.713  -0.7427
  -0.842  -0.1247  3.014  -2.39    5.688  -1.893  -3.824 ]]
Max abs diff: 0.0034618378 , Mean abs diff: 0.00049396604
..
----------------------------------------------------------------------
Ran 4 tests in 0.127s

OK

paddle-bot · 2025-09-15T09:34:41Z

Thanks for your contribution!

Sunny-bot1 · 2025-09-23T07:23:58Z

tests/quantization/test_weight_only.py

+    def test_apply_numerical_precision(self):
+        """Test numerical precision of quantized output"""
+        x = paddle.to_tensor(np.random.randn(2, self.in_features).astype("float16"))
+
+        # Reference FP32 output
+        ref_out = paddle.matmul(
+            x.astype("float32"),
+            (self.layer.weight.astype("float32") * self.layer.weight_scale.astype("float32")).transpose([1, 0]),
+        )
+        if self.layer.bias is not None:
+            ref_out += self.layer.bias.astype("float32")
+
+        # Manual quantized output
+        weight_f32 = self.layer.weight.astype("float32")
+        x_f32 = x.astype("float32")
+        quant_out = paddle.matmul(x_f32, weight_f32 * self.layer.weight_scale.astype("float32"), transpose_y=True)
+        if self.layer.bias is not None:
+            quant_out += self.layer.bias.astype("float32")
+        quant_out = quant_out.astype("float16")
+
+        np.testing.assert_allclose(ref_out.numpy(), quant_out.numpy(), rtol=1e-2, atol=1e-2)


这里好像并没有检查self.method.apply的结果

Sunny-bot1 · 2025-09-23T07:27:19Z

tests/quantization/test_weight_only.py

+from fastdeploy.model_executor.layers.quantization.weight_only import (
+    GPUWeightOnlyLinearMethod,


weight_only.py 除了 GPUWeightOnlyLinearMethod，还有 MacheteWeightOnlyLinearMethod，使用的kernel实现方式不同，如果是H卡的环境也可以加一下MacheteWeightOnlyLinearMethod的case

Sunny-bot1 · 2025-09-23T07:34:22Z

tests/quantization/test_weight_only.py

+        self.fd_config.load_config = type("load_config", (), {"load_choices": "default_v1"})()
+
+        # float32 weights
+        weight_fp32 = np.random.randn(*self.weight_shape).astype("float32")


权重一般是fp16或bf16类型

Sunny-bot1 · 2025-09-23T07:37:46Z

tests/quantization/test_weight_only.py

+        # Per-channel scale
+        max_abs = np.max(np.abs(weight_fp32), axis=1, keepdims=True) + 1e-6
+        scale = (max_abs / 127.0).astype("float32")
+
+        # Int8 quantization
+        weight_int8 = np.clip(np.round(weight_fp32 / scale), -128, 127).astype("int8")


这里建议调用GPUWeightOnlyLinearMethod里的process_loaded_weights进行权重的量化

Sunny-bot1 · 2025-09-23T07:39:47Z

tests/quantization/test_weight_only.py

+            return paddle.create_parameter(shape=shape, dtype=dtype, default_initializer=default_initializer)
+
+
+class TestGPUWeightOnlyLinearMethod(unittest.TestCase):


除了对apply方法的检查，也希望增加对process_prequanted_weights、process_loaded_weights方法的检查

add test_weight_only.py

b5d7f2e

paddle-bot bot added the contributor External developers label Sep 15, 2025

luotao1 mentioned this pull request Sep 15, 2025

【Hackathon 9th】开源贡献个人挑战赛 PaddlePaddle/Paddle#74773

Open

luotao1 added the PaddlePaddle Hackathon label Sep 15, 2025

luotao1 assigned luotao1 and YuanRisheng Sep 15, 2025

Echo-Nie and others added 4 commits September 15, 2025 23:31

Merge branch 'PaddlePaddle:develop' into WeightOnly

bf5f448

fix, add precision check

3a11ebe

Merge branch 'develop' into WeightOnly

0427cfa

Merge branch 'PaddlePaddle:develop' into WeightOnly

d84ae97

luotao1 assigned Sunny-bot1 Sep 23, 2025

Sunny-bot1 reviewed Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【Hackathon 9th No.71】add test_weight_only.py #4109

【Hackathon 9th No.71】add test_weight_only.py #4109

Uh oh!

Echo-Nie commented Sep 15, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Sep 15, 2025

Uh oh!

Sunny-bot1 Sep 23, 2025

Uh oh!

Sunny-bot1 Sep 23, 2025

Uh oh!

Sunny-bot1 Sep 23, 2025

Uh oh!

Sunny-bot1 Sep 23, 2025

Uh oh!

Sunny-bot1 Sep 23, 2025

Uh oh!

Uh oh!

		from fastdeploy.model_executor.layers.quantization.weight_only import (
		GPUWeightOnlyLinearMethod,

		return paddle.create_parameter(shape=shape, dtype=dtype, default_initializer=default_initializer)


		class TestGPUWeightOnlyLinearMethod(unittest.TestCase):

【Hackathon 9th No.71】add test_weight_only.py #4109

Are you sure you want to change the base?

【Hackathon 9th No.71】add test_weight_only.py #4109

Uh oh!

Conversation

Echo-Nie commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Sep 15, 2025

Uh oh!

Sunny-bot1 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Sunny-bot1 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Sunny-bot1 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Sunny-bot1 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Sunny-bot1 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Echo-Nie commented Sep 15, 2025 •

edited

Loading