-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
garbage output from h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b #281
Comments
This seems like the tokenizer's issue. We are adding support for custom tokenizers (#111). Meanwhile, you can try to directly modify the function below to use the correct tokenizer. vllm/vllm/engine/tokenizer_utils.py Line 13 in 4026a04
|
Thanks, I'll give that a try! BTW I just tried |
It was an easy fix:
I added the extra "or" testing for the dash variant, and all is well! |
In hindsight, this is better:
Ah, never mind! I just looked at the changes to allow a custom tokenizer, and this whole test goes away. :) |
@tjedwards We've added a new argument Try out the following: llm = LLM(model="openlm-research/open_llama_13b", tokenizer_mode="slow") |
status should have an initial value Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Increase garbage collector's threshold in order to reduce it's frequency
Using the simple python script on the "supported-models" page I was able to successfully generate output from
TheBloke/Wizard-Vicuna-13B-Uncensored-HF
, buth2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b
generates garbage.I'm running CUDA 11.7.1 on RHEL 8.4 and an NVIDIA A100-SXM-80GB.
Here's the script:
Here's output from TheBloke:
And here's output from h2oai:
Here's the full output:
The text was updated successfully, but these errors were encountered: