4bit

Here are 4 public repositories matching this topic...

a 4 bit TTL computer

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"

Fused 4bit AdamW in Cuda

Add a description, image, and links to the 4bit topic page so that developers can more easily learn about it.

To associate your repository with the 4bit topic, visit your repo's landing page and select "manage topics."