Tags: PrismML-Eng/llama.cpp
Tags
Merge pull request #32 from Vort3xed/vulkan-q2_0-kernel vulkan: Q2_0
Merge pull request #1 from PrismML-Eng/mmq [cuda] Fix mmq/mma path
ggml: add Q1_0 and Q1_0_g128 1-bit quantization support (CPU, Metal, … …CUDA) Adds two 1-bit quantization types: - Q1_0: block size 32, ~1.5 bpw - Q1_0_g128: block size 128, ~1.125 bpw Backend support: CPU (x86 SSE/AVX + ARM NEON), Metal, CUDA. Kernel implementations follow Q4_0 as boilerplate, adapted for 1-bit sign-based dequantization. CUDA MMQ kernels included but disabled (cuBLAS fallback used for prompt processing) pending accuracy debugging. Made-with: Cursor
ggml: add Q1_0 and Q1_0_g128 1-bit quantization support (CPU, Metal, … …CUDA) Adds two 1-bit quantization types: - Q1_0: block size 32, ~1.5 bpw - Q1_0_g128: block size 128, ~1.125 bpw Backend support: CPU (x86 SSE/AVX + ARM NEON), Metal, CUDA. Kernel implementations follow Q4_0 as boilerplate, adapted for 1-bit sign-based dequantization. CUDA MMQ kernels included but disabled (cuBLAS fallback used for prompt processing) pending accuracy debugging. Made-with: Cursor