We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I've used the runpod/worker-vllm:0.3.0-cuda11.8.0 container for several different LLMs and it has worked fine so far.
I've just been given a requirement to test GGUF model (specifically https://huggingface.co/impactframes/llama3_if_ai_sdpromptmkr_q4km) and it keeps generating errors:
Entry Not Found for url: https://huggingface.co/impactframes/llama3_if_ai_sdpromptmkr_q4km/resolve/main/config.json.
Is this an issue with the model, or the worker? Is there a known workaround?
Thanks
The text was updated successfully, but these errors were encountered:
vllm itself doesn't support GGUF, therefore the worker cannot support it either: vllm-project/vllm#1002
Sorry, something went wrong.
@ashleykleynhans’s answer is correct
vLLM support gguf now, would RunPod support it by single command line as usual?
Steve
No branches or pull requests
I've used the runpod/worker-vllm:0.3.0-cuda11.8.0 container for several different LLMs and it has worked fine so far.
I've just been given a requirement to test GGUF model (specifically https://huggingface.co/impactframes/llama3_if_ai_sdpromptmkr_q4km) and it keeps generating errors:
Entry Not Found for url: https://huggingface.co/impactframes/llama3_if_ai_sdpromptmkr_q4km/resolve/main/config.json.
Is this an issue with the model, or the worker? Is there a known workaround?
Thanks
The text was updated successfully, but these errors were encountered: