[Bug] --disable-flashinfer is broken

### Checklist

- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [X] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [X] 5. Please use English, otherwise it will be closed.

### Describe the bug

I'm running SGLang v0.2.13 on NVIDIA Tesla V100 with parameter `--disable-flashinfer --disable-flashinfer-sampling`.
Error:
```text
Exception in ControllerSingle:
Traceback (most recent call last):
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/managers/controller_single.py", line 166, in
 start_controller_process
    controller.loop_for_forward()
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/managers/controller_single.py", line 103, in
 loop_for_forward
    out_pyobjs = self.tp_server.exposed_step(recv_reqs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 218, in exposed
_step
    self.forward_step()
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_cont
ext
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 234, in forward
_step
    self.forward_prefill_batch(new_batch)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 446, in forward_prefill_batch
    output = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 430, in forward
    return self.forward_extend(batch)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context 
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 404, in forward_extend
    return self.model.forward(
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context 
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 287, in forward
    hidden_states = self.model(input_ids, positions, input_metadata, input_embeds)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 255, in forward
    hidden_states, residual = layer(
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 204, in forward
    hidden_states = self.input_layernorm(hidden_states)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
    return self._forward_method(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/sglang/srt/layers/layernorm.py", line 45, in forward_cuda  
    out = rmsnorm(x, self.weight.data, self.variance_epsilon)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/flashinfer/norm.py", line 52, in rmsnorm
    return _kernels.rmsnorm(input, weight, eps)
RuntimeError: RMSNorm failed with error code no kernel image is available for execution on the device
```

Related PRs and comments:
- #831 
- #907


### Reproduction

Cmdline:

```bash
python -m sglang.launch_server --model-path /root/Qwen1.5-14B-Chat --host 0.0.0.0 --port 30000 --api-key sk-*** --mem-fraction-static 0.8 --tp 2 --disable-flashinfer --disable-flashinfer-sampling
```

Model: Qwen1.5-14B-Chat

### Environment

Ubuntu 22.04.4 LTS
NVIDIA Tesla V100S-PCIE-32GB
NVIDIA Driver Version: 535.183.01   CUDA Version: 12.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] --disable-flashinfer is broken #1146

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] --disable-flashinfer is broken #1146

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions