Open
Description
Refer https://github.com/FMInference/FlexGen - they have explored storing cache at 4-bit quantization.
Metadata
Assignees
Labels
No labels
Refer https://github.com/FMInference/FlexGen - they have explored storing cache at 4-bit quantization.
Activity