-
Notifications
You must be signed in to change notification settings - Fork 588
Description
i see this prhttps://github.com/flashinfer-ai/flashinfer/pull/109 is already support sm75. but i use sglang it have error as following:
[gpu_id=0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, cache hit rate: 0.00%, #running-req: 0, #queue-req: 0
CUDA Error: no kernel image is available for execution on the device (209) /tmp/build-via-sdist-akw2qk94/flashinfer-0.1.1+cu121torch2.3/include/flashinfer/attention/prefill.cuh: line 2128 at function cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)
CUDA Error: CUDA Error: no kernel image is available for execution on the device (no kernel image is available for execution on the device209 () 209/tmp/build-via-sdist-akw2qk94/flashinfer-0.1.1+cu121torch2.3/include/flashinfer/attention/prefill.cuh) : line /tmp/build-via-sdist-akw2qk94/flashinfer-0.1.1+cu121torch2.3/include/flashinfer/attention/prefill.cuh2128: line at function 2128cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size) at function
cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)
Exception in ModelTpServer:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/tp_worker.py", line 186, in exposed_step
self.forward_step()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/tp_worker.py", line 202, in forward_step
self.forward_prefill_batch(new_batch)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_prefill_batch
output = self.model_runner.forward(batch, ForwardMode.EXTEND)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/model_runner.py", line 336, in forward
return self.forward_extend(batch)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/controller/model_runner.py", line 304, in forward_extend
return self.model.forward(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 272, in forward
hidden_states = self.model(input_ids, positions, input_metadata, input_embeds)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 240, in forward
hidden_states, residual = layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 192, in forward
hidden_states = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/models/qwen2.py", line 141, in forward
attn_output = self.attn(q, k, v, input_metadata)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/radix_attention.py", line 149, in forward
return self.extend_forward(q, k, v, input_metadata)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/radix_attention.py", line 91, in extend_forward_flashinfer
o = input_metadata.flashinfer_prefill_wrapper_paged.forward(
File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 875, in forward
return self._wrapper.forward(
RuntimeError: BatchPrefillWithPagedKVCache failed with error code no kernel image is available for execution on the device