Skip to content

[Bug]: speculative decoding dies: IndexError: index 0 is out of bounds for dimension 0 with size 0 #7047

@pseudotensor

Description

@pseudotensor

Your current environment

docker pull vllm/vllm-openai:latest
docker run -d --restart=always \
    --runtime=nvidia \
    --gpus '"device=1"' \
    --shm-size=10.24gb \
    -p 5001:5001 \
        -e NCCL_IGNORE_DISABLED_P2P=1 \
        -e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 \
    -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/   -v "${HOME}"/.triton:$HOME/.triton/  \
    --network host \
    --name phi3mini \
    vllm/vllm-openai:latest \
        --port=5001 \
        --host=0.0.0.0 \
        --model=microsoft/Phi-3-mini-128k-instruct \
        --seed 1234 \
        --trust-remote-code \
        --tensor-parallel-size=1 \
        --max-num-batched-tokens=131072 --max-log-len=100 \
        --max-model-len=131072 \
        --max-num-seqs=17 \
        --use-v2-block-manager \
        --num-speculative-tokens=5 \
        --ngram-prompt-lookup-max=4 \
        --speculative-model="[ngram]" \
        --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.phi3.txt

🐛 Describe the bug

ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py", line 375, in execute_model
ERROR 08-01 21:27:03 async_llm_engine.py:56]     return self._run_speculative_decoding_step(execute_model_req,
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
ERROR 08-01 21:27:03 async_llm_engine.py:56]     return func(*args, **kwds)
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py", line 538, in _run_speculative_decoding_step
ERROR 08-01 21:27:03 async_llm_engine.py:56]     accepted_token_ids, target_logprobs = self._verify_tokens(
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
ERROR 08-01 21:27:03 async_llm_engine.py:56]     return func(*args, **kwds)
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py", line 609, in _verify_tokens
ERROR 08-01 21:27:03 async_llm_engine.py:56]     accepted_token_ids = self.spec_decode_sampler(
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 08-01 21:27:03 async_llm_engine.py:56]     return self._call_impl(*args, **kwargs)
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 08-01 21:27:03 async_llm_engine.py:56]     return forward_call(*args, **kwargs)
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rejection_sampler.py", line 82, in forward
ERROR 08-01 21:27:03 async_llm_engine.py:56]     self._batch_modified_rejection_sampling(
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rejection_sampler.py", line 119, in _batch_modified_rejection_sampling
ERROR 08-01 21:27:03 async_llm_engine.py:56]     accepted = self._get_accepted(target_probs, draft_probs,
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rejection_sampler.py", line 190, in _get_accepted
ERROR 08-01 21:27:03 async_llm_engine.py:56]     uniform_rand[idx, :] = torch.rand(1,
ERROR 08-01 21:27:03 async_llm_engine.py:56] IndexError: index 0 is out of bounds for dimension 0 with size 0

What very first message to the model of "Who are you?" I got "I" and then died.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions