Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: AssertionError: Speculative decoding not yet supported for RayGPU backend. #4358

Open
cocoza4 opened this issue Apr 25, 2024 · 7 comments

Comments

@cocoza4
Copy link

cocoza4 commented Apr 25, 2024

🚀 The feature, motivation and pitch

Hi,

Do you guys have any workaround for the Speculative decoding not yet supported for RayGPU backend. error or idea when the RayGPU backend will support speculative decoding?

I run vllm server with the following command:

python3 -u -m vllm.entrypoints.openai.api_server \
       --host 0.0.0.0 \
       --model casperhansen/mixtral-instruct-awq \
       --tensor-parallel-size 4 \
       --enforce-eager \
       --quantization awq \
       --gpu-memory-utilization 0.96 \
       --kv-cache-dtype fp8 \
       --speculative-model mistralai/Mistral-7B-Instruct-v0.2 \
       --num-speculative-tokens 3 \
       --use-v2-block-manager \
       --num-lookahead-slots 5

However, I got AssertionError: Speculative decoding not yet supported for RayGPU backend.

Alternatives

No response

Additional context

No response

@psych0v0yager
Copy link

I am having the same issue

python -m vllm.entrypoints.openai.api_server --model /home/llama3_70B_awq --port 8000 --tensor-parallel-size 2 --gpu-memory-utilization 0.95 --kv-cache-dtype fp8 --max-num-seqs 32 --speculative-model /home/llama3_8B_gptq --num-speculative-tokens 3 --use-v2-block-manager

@jamestwhedbee
Copy link
Contributor

running into this as well

@bkchang
Copy link

bkchang commented May 10, 2024

Running into this as well

2 similar comments
@YuCheng-Qi
Copy link

Running into this as well

@MRKINKI
Copy link

MRKINKI commented May 20, 2024

Running into this as well

@bkchang
Copy link

bkchang commented May 20, 2024

This issue should have been resolved by #4840

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants