Good ideas from llama.cpp

I've been tracking the `llama.cpp` repo. I'll use this issue to list any good ideas / things we should be aware of to keep up with in Rust land:

- [ ] GPTQ quantization :eyes:  https://github.com/ggerganov/llama.cpp/issues/9
- [ ] Not sure how that is even possible (isn't the task I/O bound?), but people are claiming great speedups when loading the modelling in parallel. This should be pretty easy to implement using `rayon`. https://github.com/ggerganov/llama.cpp/issues/85#issuecomment-1470814328
- [ ] Seems there's an issue with the normalization function used. It should be RMSNorm. Would be good to keep an eye on this, and simply swap the the `ggml` function once it's implemented on the C++ side :eyes:  https://github.com/ggerganov/llama.cpp/issues/173#issuecomment-1470801468
- [x] It looks like dropping to F16 for the memory_k and memory_v reduces memory usage. It is not known whether this hurts quality, but we should follow the C++ side and add a flag to drop to F16 for the memory. This would also make the cached prompts added as part of  #14 take half the size on disk, which is a nice bonus: https://github.com/ggerganov/llama.cpp/pull/154#pullrequestreview-1342214004
- [x] Looks like the fix from #1 just landed upstream. We should make sure to fix it here too https://github.com/ggerganov/llama.cpp/pull/161
- [ ] The tokenizer used in llama.cpp has some issues. It would be better to use `sentencepiece`, which is the one that was used during the original LLaMA training. There seems to be [a rust crate for sentencepiece](https://docs.rs/sentencepiece/latest/sentencepiece/). We should check if a drop-in replacement is possible https://github.com/ggerganov/llama.cpp/issues/167


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Good ideas from llama.cpp #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Good ideas from llama.cpp #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions