This repository was archived by the owner on Jun 24, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 375
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
Good ideas from llama.cpp #15
Copy link
Copy link
Closed
Labels
issue:enhancementNew feature or requestNew feature or request
Description
I've been tracking the llama.cpp
repo. I'll use this issue to list any good ideas / things we should be aware of to keep up with in Rust land:
- GPTQ quantization 👀 GPTQ Quantization (3-bit and 4-bit) ggml-org/llama.cpp#9
- Not sure how that is even possible (isn't the task I/O bound?), but people are claiming great speedups when loading the modelling in parallel. This should be pretty easy to implement using
rayon
. Faster loading of the model ggml-org/llama.cpp#85 (comment) - Seems there's an issue with the normalization function used. It should be RMSNorm. Would be good to keep an eye on this, and simply swap the the
ggml
function once it's implemented on the C++ side 👀 Use RMSNorm ggml-org/llama.cpp#173 (comment) - It looks like dropping to F16 for the memory_k and memory_v reduces memory usage. It is not known whether this hurts quality, but we should follow the C++ side and add a flag to drop to F16 for the memory. This would also make the cached prompts added as part of Implementation of prompt caching #14 take half the size on disk, which is a nice bonus: Use F16 for memory_k and memory_v (as suggested in #146) ggml-org/llama.cpp#154 (review)
- Looks like the fix from bytesFromNibbles error #1 just landed upstream. We should make sure to fix it here too FIX: "inline" -> "static inline" for bytesFromNibbles and packNibbles ggml-org/llama.cpp#161
- The tokenizer used in llama.cpp has some issues. It would be better to use
sentencepiece
, which is the one that was used during the original LLaMA training. There seems to be a rust crate for sentencepiece. We should check if a drop-in replacement is possible Differences with the llama tokenizer ggml-org/llama.cpp#167
schneiderfelipekassane, faassen, philpax, erlend-sh, hpnyaggerman and 3 more
Metadata
Metadata
Assignees
Labels
issue:enhancementNew feature or requestNew feature or request