-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GGML model #617
Add GGML model #617
Conversation
Thanks so much for the contribution so far! Looks great so far. Left some comments which may hopefully be helpful, and also just noting that |
Thanks for the comments @haileyschoelkopf I'm looking into implementing them today. I will also look into Update: I've implemented a function using |
Will try to review this tomorrow! Looks great, thanks so much! From a quick skim, I think there are just a couple minor things extant re: ensuring things break more than just logging an error message, if there's no/malformed response from the server. I can probably go and add those if need be. A few-line pointer to add to the library README on what GGML apis are supported + a pointer on (where to go in order to) spin up an inference server for models would be awesome if not too much work! |
Thanks @haileyschoelkopf! I updated the README with the command I'm using locally. I'm pretty new to evaluating models so I may be missing some details still. |
Thanks! LLaMA-CPP is surprisingly easy to install... do you have any recommendations for models + tasks to test with GGML? Running the 8bit LLaMA from here: https://huggingface.co/TheBloke/LLaMa-7B-GGML gives me |
Thanks @haileyschoelkopf I'm getting similar results. Update: I took out the reordering code which seemed to prevent the script from running. Pushed changes if anyone else wants to try. |
Seems to be broken I tried 85d3b8d and it fails to run
I tried again with the previous commit 0fa2c7d and it runs but it returns a clearly incorrect result after doing a single completion (it finishes instantly instead of taking ~30min as you'd expect hellaswag, and returns the following results).
|
Thanks @IgnacioFDM I'll look into a regression there. The results I'm getting are even worse although I'm able to run it for longer iterations now. |
just tried this out and getting the same issue as @IgnacioFDM. |
It seems llama.cpp is moving to a new GGUF format, so it might be good to change the name to GGUFLM |
@matthoffner Can you post some evals run with the patched implementation, ideally with non-quantized #s for comparison? |
Will try to revisit this week, if anyone else can confirm that would be great too, thanks. |
@matthoffner any luck? |
@StellaAthena No luck. It runs without crashing but the results don't seem accurate. |
Hmmm. That's unfortunate, though it could be that the quantization causes performance degradation. Try manually inspecting a few of the results (esp. the incorrect ones) and then try running the same prompts through the library's native interface. Do you get the same logits when using the model through our library? |
hey @matthoffner opened a PR against your fork: matthoffner#1 changing how the loglikelihood for completions is computed, following more closely what is done for the For
|
Return score from continuation logprobs
Thanks @LorenzoMinto I merged your PR |
This is the eval for
|
These look good enough to merge to me, unless we want to confirm with other models and more datasets first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you everyone for the contributions! The posted results look good enough to merge. I will test this locally myself tomorrow, also left a few nitpicks which I can incorporate prior to merge (in particular, we should avoid silently continuing when the model does not return logprobs as expected.)
@haileyschoelkopf Let's update the |
Will do! Already done for the refactor branch PR |
@haileyschoelkopf are you sure? It doesn't look like it. |
In #967 (not yet merged) there’s the updated model support table with GGUF |
Sorry, I shouldn't try to read things at midnight. |
This PR adds initial
ggml
model support based on thellama-cpp-python
server. This should support the majority of models, with the potential of using a library like ctransformers useful for ggml models that do no work with llama.cpp.