-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to used qlora models with vllm #252
Comments
Thank you @zhuohan123 for the reply. |
You just need to merge the model. Vllm doesn't support LoRA. |
@ehartford but merging has to be done in higher precision. Doesnt that defeat the purpose of wanting to have the base weights in low precision to speed up inference? |
Closing in favour of the feature request #3225 |
RuntimeErrors are not observed anymore on habana_main when disable_tensor_cache is used. This PR enables disable_tensor_cache.
I have trained falcon 7b model with qlora but the inference time for outputs is too high.So I want to use vllm for increasing the inference time for that I have used a code snippet to load the model path
llm = LLM(model="/content/trained-model/").
But I am getting an error :
The text was updated successfully, but these errors were encountered: