Skip to content

Tags: PrismML-Eng/llama.cpp

Tags

prism-b8849-747eb36

Toggle prism-b8849-747eb36's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #32 from Vort3xed/vulkan-q2_0-kernel

vulkan: Q2_0

prism-b8846-d104cf1

Toggle prism-b8846-d104cf1's commit message
release-prism: install spirv-headers for ubuntu-arm64 vulkan build

prism-b8796-e2d6742

Toggle prism-b8796-e2d6742's commit message
Remove Windows CUDA 12.8 (not supported by setup action)

prism-b8201-ba7e817

Toggle prism-b8201-ba7e817's commit message
fix windows-hip artifact path

prism-b8196-f5dda72

Toggle prism-b8196-f5dda72's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #8 from PrismML-Eng/cpu-fixes

some cpu fixes; getting ready for upstream PR; e.g. id 40 is taken by…

prism-b8194-1179bfc

Toggle prism-b8194-1179bfc's commit message
add slim release workflow for prism

prism-b8194-c3528ba

Toggle prism-b8194-c3528ba's commit message
add slim release workflow for prism

v0.0.2-prism

Toggle v0.0.2-prism's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #1 from PrismML-Eng/mmq

[cuda] Fix mmq/mma path

v0.0.1-prism

Toggle v0.0.1-prism's commit message
ggml: add Q1_0 and Q1_0_g128 1-bit quantization support (CPU, Metal, …

…CUDA)

Adds two 1-bit quantization types:
- Q1_0: block size 32, ~1.5 bpw
- Q1_0_g128: block size 128, ~1.125 bpw

Backend support: CPU (x86 SSE/AVX + ARM NEON), Metal, CUDA.
Kernel implementations follow Q4_0 as boilerplate, adapted for
1-bit sign-based dequantization.

CUDA MMQ kernels included but disabled (cuBLAS fallback used for
prompt processing) pending accuracy debugging.

Made-with: Cursor

stable

Toggle stable's commit message
ggml: add Q1_0 and Q1_0_g128 1-bit quantization support (CPU, Metal, …

…CUDA)

Adds two 1-bit quantization types:
- Q1_0: block size 32, ~1.5 bpw
- Q1_0_g128: block size 128, ~1.125 bpw

Backend support: CPU (x86 SSE/AVX + ARM NEON), Metal, CUDA.
Kernel implementations follow Q4_0 as boilerplate, adapted for
1-bit sign-based dequantization.

CUDA MMQ kernels included but disabled (cuBLAS fallback used for
prompt processing) pending accuracy debugging.

Made-with: Cursor