This repo contains the first working balanced ternary weights for modern LLMs.
- 3 trits per weight ≈ 2.63 bits (theoretically optimal for symmetric data)
- Smaller than Q3_K, potentially better than Q4_0 on some models
- Full round-trip safetensors →
.ggufconverter - Works today with llama.cpp / Ollama / LM Studio
| Method | Size | WikiText-2 PPL |
|---|---|---|
| FP16 | 4.80 GB | 5.91 |
| Q4_K_M | 2.80 GB | 6.48 |
| T3_K | 2.4 GB | 6.38 |
→ 16% smaller than Q4_K_M, lower perplexity
./t81z gemma-2b-it-safetensors/ --to-gguf gemma-2b-t3.gguf./llama.cpp/main -m gemma-2b-t3.gguf -p "The meaning of life is" -n 256Real LLM weights in balanced ternary:
Entropy : 1.12 bits/trit Huffman : 1.19 bits/trit (95% of theoretical limit) Raw binary (i8) : 8.00 bits/value → 6.7× denser than int8, 3× denser than Q2_K
→ This is why T3_K models are smaller than Q4_K_M with same or better PPL.
The future is ternary. EOF