can not support 2080ti

i see this pr[https://github.com/flashinfer-ai/flashinfer/pull/109](url) is already support sm75. but i use sglang it have error as following:
[gpu_id=0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, cache hit rate: 0.00%, #running-req: 0, #queue-req: 0
CUDA Error: no kernel image is available for execution on the device (209) /tmp/build-via-sdist-akw2qk94/flashinfer-0.1.1+cu121torch2.3/include/flashinfer/attention/prefill.cuh: line 2128 at function cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)
CUDA Error: CUDA Error: no kernel image is available for execution on the device (no kernel image is available for execution on the device209 () 209/tmp/build-via-sdist-akw2qk94/flashinfer-0.1.1+cu121torch2.3/include/flashinfer/attention/prefill.cuh) : line /tmp/build-via-sdist-akw2qk94/flashinfer-0.1.1+cu121torch2.3/include/flashinfer/attention/prefill.cuh2128: line  at function 2128cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size) at function 
cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)
Exception in ModelTpServer:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/tp_worker.py", line 186, in exposed_step
    self.forward_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/tp_worker.py", line 202, in forward_step
    self.forward_prefill_batch(new_batch)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_prefill_batch
    output = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/model_runner.py", line 336, in forward
    return self.forward_extend(batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/model_runner.py", line 304, in forward_extend
    return self.model.forward(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 272, in forward
    hidden_states = self.model(input_ids, positions, input_metadata, input_embeds)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 240, in forward
    hidden_states, residual = layer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 192, in forward
    hidden_states = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 141, in forward
    attn_output = self.attn(q, k, v, input_metadata)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/radix_attention.py", line 149, in forward
    return self.extend_forward(q, k, v, input_metadata)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/radix_attention.py", line 91, in extend_forward_flashinfer
    o = input_metadata.flashinfer_prefill_wrapper_paged.forward(
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 875, in forward
    return self._wrapper.forward(
RuntimeError: BatchPrefillWithPagedKVCache failed with error code no kernel image is available for execution on the device

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

can not support 2080ti #402

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

can not support 2080ti #402

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions