-
Notifications
You must be signed in to change notification settings - Fork 349
Float8Tensor per row quantization pass bias to fbgemm kernel #2884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2884
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 4d70152 with merge base fbe3df9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
6935cc8
to
bacbe8c
Compare
2c1e5da
to
613585d
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
613585d
to
b98440e
Compare
b98440e
to
6acc102
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
looks reasonable, can we add a test which covers the new path? |
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
6acc102
to
d5465ea
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
d5465ea
to
951a49c
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
951a49c
to
5860000
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
d94d663
to
0c70dee
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/dtypes/test_affine_quantized_float.py -k test_expected_kernels_on_gpu python test/quantization/quantize_/workflows/float8/test_float8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
0c70dee
to
2b5046b
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_expected_gpu_kernel_fbgemm ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
2b5046b
to
091ad9e
Compare
091ad9e
to
94aec1a
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_expected_gpu_kernel_fbgemm ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_expected_gpu_kernel_fbgemm ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
94aec1a
to
5f8d5e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have a test, this seems like it was a bug but wasn't caught?
@drisspg yeah test is this one:
but it's probably better to have a more exhaustive one for all options (auto and torch) as well it is a performance "bug" I think, the effect is that we are not using the most efficient path if there is a bias |
"torch.ops.triton.quantize_fp8_row.default", 1 | ||
).check_count("torch.ops.fbgemm.f8f8bf16_rowwise.default", 1).run(code[0]) | ||
).check_count("torch.ops.fbgemm.f8f8bf16_rowwise.default", 1).check_not( | ||
"triton_poi_fused_add_0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @drisspg we explicitly test that triton_poi_fused_add_0
is not generated, to make sure there is no additional res + bias
there in this code path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not safe to depend on inductor generated kernel names such as triton_poi_fused_add_0
, I think it would be better to ensure that there are no additional kernels called after torch.ops.fbgemm.f8f8bf16_rowwise.default
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, updated
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_expected_gpu_kernel_fbgemm ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
5f8d5e2
to
4d70152
Compare
Summary: Previously bias is not passed to fbgemm kernel for float8 per row quant, this PR adds it. Difference is we should have a faster float8 per row quantized kernel, without changing numerics or other things. Test Plan: ``` python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_expected_gpu_kernel_fbgemm ``` Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2884, branch: jerryzh168/stack/60
Stacked PRs:
Float8Tensor per row quantization pass bias to fbgemm kernel
Summary:
Previously bias is not passed to fbgemm kernel for float8 per row quant,
this PR adds it.
Difference is we should have a faster float8 per row quantized kernel, without changing
numerics or other things.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: