Skip to content

can not support 2080ti #402

@bltcn

Description

@bltcn

i see this prhttps://github.com/flashinfer-ai/flashinfer/pull/109 is already support sm75. but i use sglang it have error as following:
[gpu_id=0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, cache hit rate: 0.00%, #running-req: 0, #queue-req: 0
CUDA Error: no kernel image is available for execution on the device (209) /tmp/build-via-sdist-akw2qk94/flashinfer-0.1.1+cu121torch2.3/include/flashinfer/attention/prefill.cuh: line 2128 at function cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)
CUDA Error: CUDA Error: no kernel image is available for execution on the device (no kernel image is available for execution on the device209 () 209/tmp/build-via-sdist-akw2qk94/flashinfer-0.1.1+cu121torch2.3/include/flashinfer/attention/prefill.cuh) : line /tmp/build-via-sdist-akw2qk94/flashinfer-0.1.1+cu121torch2.3/include/flashinfer/attention/prefill.cuh2128: line at function 2128cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size) at function
cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)
Exception in ModelTpServer:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/tp_worker.py", line 186, in exposed_step
self.forward_step()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/tp_worker.py", line 202, in forward_step
self.forward_prefill_batch(new_batch)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_prefill_batch
output = self.model_runner.forward(batch, ForwardMode.EXTEND)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/model_runner.py", line 336, in forward
return self.forward_extend(batch)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/model_runner.py", line 304, in forward_extend
return self.model.forward(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 272, in forward
hidden_states = self.model(input_ids, positions, input_metadata, input_embeds)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 240, in forward
hidden_states, residual = layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 192, in forward
hidden_states = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 141, in forward
attn_output = self.attn(q, k, v, input_metadata)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/radix_attention.py", line 149, in forward
return self.extend_forward(q, k, v, input_metadata)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/radix_attention.py", line 91, in extend_forward_flashinfer
o = input_metadata.flashinfer_prefill_wrapper_paged.forward(
File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 875, in forward
return self._wrapper.forward(
RuntimeError: BatchPrefillWithPagedKVCache failed with error code no kernel image is available for execution on the device

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions