-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
I'm running SGLang v0.2.13 on NVIDIA Tesla V100 with parameter --disable-flashinfer --disable-flashinfer-sampling.
Error:
Exception in ControllerSingle:
Traceback (most recent call last):
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/managers/controller_single.py", line 166, in
start_controller_process
controller.loop_for_forward()
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/managers/controller_single.py", line 103, in
loop_for_forward
out_pyobjs = self.tp_server.exposed_step(recv_reqs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 218, in exposed
_step
self.forward_step()
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_cont
ext
return func(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 234, in forward
_step
self.forward_prefill_batch(new_batch)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 446, in forward_prefill_batch
output = self.model_runner.forward(batch, ForwardMode.EXTEND)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 430, in forward
return self.forward_extend(batch)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 404, in forward_extend
return self.model.forward(
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 287, in forward
hidden_states = self.model(input_ids, positions, input_metadata, input_embeds)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 255, in forward
hidden_states, residual = layer(
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 204, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
return self._forward_method(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/layers/layernorm.py", line 45, in forward_cuda
out = rmsnorm(x, self.weight.data, self.variance_epsilon)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/flashinfer/norm.py", line 52, in rmsnorm
return _kernels.rmsnorm(input, weight, eps)
RuntimeError: RMSNorm failed with error code no kernel image is available for execution on the device
Related PRs and comments:
- [Bug] sglang.launch_server error on V100 (disable flashinfer) #831
- feat: use FlashInfer rmsnorm and silu #907
Reproduction
Cmdline:
python -m sglang.launch_server --model-path /root/Qwen1.5-14B-Chat --host 0.0.0.0 --port 30000 --api-key sk-*** --mem-fraction-static 0.8 --tp 2 --disable-flashinfer --disable-flashinfer-samplingModel: Qwen1.5-14B-Chat
Environment
Ubuntu 22.04.4 LTS
NVIDIA Tesla V100S-PCIE-32GB
NVIDIA Driver Version: 535.183.01 CUDA Version: 12.2
Metadata
Metadata
Assignees
Labels
No labels