-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Initial Activation Quantization Support #4525
Merged
robertgshaw2-neuralmagic
merged 49 commits into
vllm-project:main
from
neuralmagic:ds-quant
May 23, 2024
Merged
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
4d27a2c
Initial `CompressedTensors` config + Activation Quantization support …
dsikka 92b3703
add get_quant method to compressed tensors config
dsikka 2a3eb83
small rebase fixed
dsikka 3dd1fe8
format
dsikka f2f8c52
fix mypy complaints
c9308eb
Merge branch 'main' into ds-quant
dsikka d9d49b5
format fixes
dsikka b111ee6
Merge branch 'main' into ds-quant
dsikka c31a7af
format fix post rebase
dsikka ca01b39
lazy import CompressedTensorsW8A8StaticTensor (#220)
varun-sundar-rabindranath f0197d4
lazy cutlass_gemm_dq import (#221)
varun-sundar-rabindranath 4624b46
fix asm
75757d5
update shape change
dsikka e1df0eb
add todo
dsikka bc0991c
Rename quant_per_tensor -> static_scaled_int8_quant
74ad650
Remove cruft
43c43f3
Merge branch 'main' into ds-quant
dsikka cf5600f
fixes : typo
169ce7f
py-cutlass temporary hack for num_prompts==1
03b53e7
yapf
f9df31b
add test_int8_quant
ba4b6b3
call cpp cutlass
3c223c6
Merge branch 'main' into ds-quant
dsikka b27f31a
remove cutlass py interface
b589cdd
format.sh
98159cf
remove fake-quant
8dbeb31
add compressed tensors test
dsikka 5eeb40a
remove torch.int8
dsikka c55e023
format
dsikka f5cbbd3
fix config parsing to match new model
dsikka a685957
revert parsing to use default pathway
dsikka 4dfb37f
PR comments
dsikka de81f9e
Fix scales/zero-points device allocation
15f1863
ruff
bd53847
add better comments
b2926f3
add comment
dsikka 1274386
Merge branch 'main' into ds-quant
dsikka 18640c8
clang format
dsikka 5c5dc84
clang format again
dsikka a44b4a0
address PR comments
6f0e6e1
clang-format
0090454
remove layer name
dsikka 4b10fd7
remove unused import
dsikka 68a59c7
remove parent name
dsikka b0afe67
Fix rounding
4f4951e
comment
869de3f
cruft
e68e391
yapf
d77cf50
remove unquantized check
dsikka File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Merge branch 'main' into ds-quant
- Loading branch information
commit 43c43f3c494afc7b55919a1f83609fbb07d7e8eb
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
You are viewing a condensed version of this merge commit. You can view the full changes here.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unclear to me about the name
compressed_tensors
. I suppose this is the official method name of SparseML? Then can we just usesparseml
here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compressed-tensors
is the name of the package responsible for saving quantized and sparse modelsSo the flow is:
safetensors
with acompressed-tensors
config