Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] AQ AZP 3/4: Asymmetric quantization kernels #7270

Merged
merged 29 commits into from
Sep 16, 2024
Merged
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
9c5b95b
DRAFT dynamic azp quant kernel - failing non-deterministically
ProExpertProg Jul 22, 2024
c0916db
Fixed blockReduce bug! Also using round-to-even for azp
ProExpertProg Jul 23, 2024
69f9493
Remove scale adjustment
ProExpertProg Jul 23, 2024
15e4c72
Fixed saturation in kernel
ProExpertProg Jul 23, 2024
6353a8b
Integer allclose comparison
ProExpertProg Jul 23, 2024
a95790a
utils fix
ProExpertProg Jul 24, 2024
84db5cd
Fixed torch ref conversion
ProExpertProg Jul 24, 2024
d11340c
Format
ProExpertProg Jul 24, 2024
fe91441
Inverted azp sign to be consistent with RFC, unit tests, and compress…
ProExpertProg Jul 24, 2024
f769c99
Fix order of rounding in test (doesn't matter for small numbers, just…
ProExpertProg Jul 25, 2024
9e49812
Fewer tests
ProExpertProg Jul 25, 2024
c1ad358
Static per-tensor kernels added
ProExpertProg Jul 25, 2024
25d0f58
Reduced test size, fixed custom_ops wrapper
ProExpertProg Jul 27, 2024
e05068c
format
ProExpertProg Aug 6, 2024
5d249fe
Merge remote-tracking branch 'refs/remotes/upstream/main' into luka/a…
ProExpertProg Aug 27, 2024
d02c568
Merge fixes
ProExpertProg Aug 27, 2024
e4dc101
Fix for AMD build
ProExpertProg Aug 29, 2024
31b3e44
PR comments: Python nits
ProExpertProg Sep 10, 2024
5a9762e
PR comments: saturation code
ProExpertProg Sep 10, 2024
8aed02a
explicit nearest rounding mode
ProExpertProg Sep 10, 2024
557db87
Added rounding mode guard
ProExpertProg Sep 10, 2024
2b24032
Rounding mode stuff removed, added comment
ProExpertProg Sep 10, 2024
5e9a0cb
Fixed test
ProExpertProg Sep 10, 2024
65b2f9c
Improved nearbyint rounding comment
ProExpertProg Sep 10, 2024
45e1d9e
Added saturating cast test
ProExpertProg Sep 10, 2024
2232b6d
Fixed scaled_int8_quant in qqq
ProExpertProg Sep 11, 2024
8df3b2d
Merge remote-tracking branch 'upstream/main' into luka/aq-azp-kernels
ProExpertProg Sep 11, 2024
04a539e
Fixed ops_check & azp test atol
ProExpertProg Sep 12, 2024
a3b9f6a
Fixed cpu bindings
ProExpertProg Sep 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixed scaled_int8_quant in qqq
  • Loading branch information
ProExpertProg committed Sep 11, 2024
commit 2232b6dd8ced9aa850c394e29d32cbbd253fd79c
2 changes: 1 addition & 1 deletion vllm/model_executor/layers/quantization/qqq.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ def apply(
size_k = x_2d.shape[1]
size_n = s_ch.shape[1]

x_int8, s_tok = ops.scaled_int8_quant(x_2d)
x_int8, s_tok, _ = ops.scaled_int8_quant(x_2d)

output_2d = ops.marlin_qqq_gemm(x_int8, qweight, s_tok, s_ch, s_group,
workspace, size_m, size_n, size_k)
Expand Down
Loading