You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Note: had to remove int4/int8 functionality to simplify the refactor.
Whoever uses this script next and needs that functionality can add this
back.
Test Plan:
```
pytest test/prototype/test_mixed_precision.py -s -x
```
Reviewers:
Subscribers:
Tasks:
Tags:
ghstack-source-id: 2b0f46d
ghstack-comment-id: 2706815985
Pull Request resolved: #1854
Apply int N-bit weight only quantization to a linear layer.
17
+
Configuration for applying int N-bit weight only quantization to a linear layer.
13
18
Args:
14
19
`group_size`: parameter for quantization, controls the granularity of quantization, smaller size is more fine grained, choices are [512, 256, 128, 64, 32]
15
20
`n`: number of bits to quantize to, choices are [8, 6, 5, 4, 3, 2]
0 commit comments