GGUF compatibility #70

adam-clarey · 2024-05-14T12:57:29Z

I've used the runpod/worker-vllm:0.3.0-cuda11.8.0 container for several different LLMs and it has worked fine so far.

I've just been given a requirement to test GGUF model (specifically https://huggingface.co/impactframes/llama3_if_ai_sdpromptmkr_q4km) and it keeps generating errors:

Is this an issue with the model, or the worker? Is there a known workaround?

Thanks

ashleykleynhans · 2024-05-14T13:04:27Z

vllm itself doesn't support GGUF, therefore the worker cannot support it either:
vllm-project/vllm#1002

alpayariyak · 2024-05-14T16:16:56Z

@ashleykleynhans’s answer is correct

thusinh1969 · 2024-09-28T08:26:01Z

vLLM support gguf now, would RunPod support it by single command line as usual?

Steve

alpayariyak closed this as completed May 14, 2024

Provide feedback