Closed
Description
There have been 3 important updates to the llama.cpp recently (a few days ago). One of them is the addition of pipeline parallelism (multi-threading) (ggml-org/llama.cpp#6017). The other important update was the correction of the embeddings bug. And one more update to release all GPU memory (this was also a bug - memory was not released). Thank you slaren!
These are very important updates. Looking forward to the addition of the latest llama.cpp code to LLamaSharp.