-
Notifications
You must be signed in to change notification settings - Fork 256
groupsize consistency #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: half of the apis used groupsize and half used group_size, swapping them all to groupsize Test Plan: python eval.py -q int8wo --limit 1 wikitext: {'word_perplexity,none': 12.204889603121593, 'byte_perplexity,none': 1.5965674184201175, 'bits_per_byte,none': 0.6749734750293632, 'alias': 'wikitext'} python generate.py --quantization int4wo-64 Average tokens/sec: 13.93 Average Bandwidth: 52.04 GB/s Peak Memory Usage: 15.92 GB Model Size: 3.74 GB Reviewers: Subscribers: Tasks: Tags:
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/417
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 867ec9a with merge base ef1e745 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
can you change it to |
I'm kind of expecting this to fail CI somewhere, will fix after the issue is identified |
Yeah +1 on group_size |
How about block_size used by choose_qparams_affine Though it is a bit different as it expects a tuple instead of an int (like group_size). |
|
@jerryzh168 understand your point. But it is a bit confusing, since |
what do you mean by |
@jerryzh168 yes, I understand that part. I think my confusion comes from the non-standardized use of the word "group" and "block". Do people always mean "group" as consecutive elements along a dimension (typically last dim) and "block" as more general (can have 2D structure like a (64, 64) tile for example)? (per channel quant can also be viewed as group-wise quant with group_size=channel length. per tensor quant can also be view as group-wise quant by flattening the tensor and with group_size=tensor size). For example, 8-bit Adam paper from bnb (https://arxiv.org/pdf/2110.02861) uses the word "block_size" here, but I think it is actually "group_size" according to usage in torchao? Happy to be corrected if I have a wrong understanding. |
ah I see, I think in torchao, we just use group_size to indicate the group_size for last dimension of a 2D tensor, this is where we are using it before: https://github.com/pytorch/ao/pull/321/files#diff-7c9b4c8c6d4ef9c47873263304a335d5cf56c3ac9f98ba10b994cd80dc9c2709L652-L654, so it will just be a single number indicating how many elements we want in the same group. block_size is a more general term I think. |
* Revert "Revert "Embedding quantization per backend (pytorch#402)" (pytorch#411)" This reverts commit 8b35acdff4fded779799ab8a419e55f885dd8918. * merge GGUF tests into pull.yml
Summary:
half of the apis used groupsize and half used group_size, swapping them all to groupsize
Test Plan:
python eval.py -q int8wo --limit 1
wikitext: {'word_perplexity,none': 12.204889603121593, 'byte_perplexity,none': 1.5965674184201175, 'bits_per_byte,none': 0.6749734750293632, 'alias': 'wikitext'}
python generate.py --quantization int4wo-64
Average tokens/sec: 13.93
Average Bandwidth: 52.04 GB/s
Peak Memory Usage: 15.92 GB
Model Size: 3.74 GB
Reviewers:
Subscribers:
Tasks:
Tags: