Skip to content
/ ternary Public

Ternary Quantization for LLMs: Implement balanced ternary (T3_K) weights for 2.63-bit quantization—the first working solution for modern large language models.

Notifications You must be signed in to change notification settings

t81dev/ternary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trinity Ecosystem.

Ternary LLMs — 2.63-bit balanced ternary quantization

This repo contains the first working balanced ternary weights for modern LLMs.

  • 3 trits per weight ≈ 2.63 bits (theoretically optimal for symmetric data)
  • Smaller than Q3_K, potentially better than Q4_0 on some models
  • Full round-trip safetensors → .gguf converter
  • Works today with llama.cpp / Ollama / LM Studio

Results so far (Gemma-2-2B-IT)

Method Size WikiText-2 PPL
FP16 4.80 GB 5.91
Q4_K_M 2.80 GB 6.48
T3_K 2.4 GB 6.38

16% smaller than Q4_K_M, lower perplexity

One-command conversion

./t81z gemma-2b-it-safetensors/ --to-gguf gemma-2b-t3.gguf

Run

./llama.cpp/main -m gemma-2b-t3.gguf -p "The meaning of life is" -n 256

Ternary Compresses Better Than Binary

Real LLM weights in balanced ternary:

Entropy : 1.12 bits/trit Huffman : 1.19 bits/trit (95% of theoretical limit) Raw binary (i8) : 8.00 bits/value → 6.7× denser than int8, 3× denser than Q2_K

→ This is why T3_K models are smaller than Q4_K_M with same or better PPL.

The future is ternary. EOF

About

Ternary Quantization for LLMs: Implement balanced ternary (T3_K) weights for 2.63-bit quantization—the first working solution for modern large language models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published