Closed
Description
First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK.
Considering that this is using GGML and seems based directly on llama.cpp
:
Why is this a separate project to llama.cpp
, given that llama.cpp
already supports BitNet ternary quants? (ggml-org/llama.cpp#8151)
Are these simply more optimised kernels?
If so, how do they compare to llama's implementation?
Can/should they be contributed back to llama.cpp
?
Metadata
Metadata
Assignees
Labels
No labels