a 4 bit TTL computer
-
Updated
Oct 1, 2024 - Python
a 4 bit TTL computer
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"
Add a description, image, and links to the 4bit topic page so that developers can more easily learn about it.
To associate your repository with the 4bit topic, visit your repo's landing page and select "manage topics."