This repository was archived by the owner on Jun 24, 2024. It is now read-only.
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
Let's collaborate #4
Closed
Description
[apologies for early send, accidentally hit enter]
Hey there! Turns out we think on extremely similar wavelengths - I did the exact same thing as you, for the exact same reasons (libraryification), and through the use of similar abstractions: https://github.com/philpax/ggllama
Couple of differences I spotted on my quick perusal:
- My version builds on both Windows and Linux, but fails to infer correctly past the first round. Windows performance is also pretty crappy because
ggml
doesn't support multithreading on Windows. - I use
PhantomData
with theTensor
s to prevent them outliving theContext
they're spawned from. - I vendored
llama.cpp
in so that I could track it more directly and use itsggml.c/h
, and to make it obvious which version I was porting.
Given yours actually works, I think that it's more promising :p
What are your immediate plans, and what do you want people to help you out with? My plan was to get it working, then librarify it, make a standalone Discord bot with it as a showcase, and then investigate using a Rust-native solution for the tensor manipulation (burn, ndarray, arrayfire, etc) to free it from the ggml dependency.
Metadata
Metadata
Assignees
Labels
No labels