-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
When I add `enable-mixed-chunk" it will eventually cause crash in the.
sglang | [2025-06-06 10:23:32 TP0] Prefill batch. #new-seq: 2, #new-token: 8191, #cached-token: 0, token usage: 0.01, #running-req: 1, #queue-req: 7
sglang | [2025-06-06 10:23:36 TP0] Prefill batch. #new-seq: 3, #new-token: 8190, #cached-token: 0, token usage: 0.02, #running-req: 2, #queue-req: 5
sglang | [2025-06-06 10:23:37 TP1] Scheduler hit an exception: Traceback (most recent call last):
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2271, in run_scheduler_process
sglang | scheduler.event_loop_normal()
sglang | File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
sglang | return func(*args, **kwargs)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 636, in event_loop_normal
sglang | batch = self.get_next_batch_to_run()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1297, in get_next_batch_to_run
sglang | new_batch = self.get_new_batch_prefill()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1458, in get_new_batch_prefill
sglang | new_batch.mix_with_running(self.running_batch)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1268, in mix_with_running
sglang | out_cache_loc = torch.cat([self.out_cache_loc, running_batch.out_cache_loc])
sglang | TypeError: expected Tensor as element 1 in argument 0, but got NoneType
sglang |
sglang | [2025-06-06 10:23:37] Received sigquit from a child process. It usually means the child failed.
sglang | [2025-06-06 10:23:37 TP3] Scheduler hit an exception: Traceback (most recent call last):
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2271, in run_scheduler_process
sglang | scheduler.event_loop_normal()
sglang | File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
sglang | return func(*args, **kwargs)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 636, in event_loop_normal
sglang | batch = self.get_next_batch_to_run()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1297, in get_next_batch_to_run
sglang | new_batch = self.get_new_batch_prefill()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1458, in get_new_batch_prefill
sglang | new_batch.mix_with_running(self.running_batch)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1268, in mix_with_running
sglang | out_cache_loc = torch.cat([self.out_cache_loc, running_batch.out_cache_loc])
sglang | TypeError: expected Tensor as element 1 in argument 0, but got NoneType
sglang |
sglang | [2025-06-06 10:23:37] Received sigquit from a child process. It usually means the child failed.
sglang | [2025-06-06 10:23:37 TP2] Scheduler hit an exception: Traceback (most recent call last):
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2271, in run_scheduler_process
sglang | scheduler.event_loop_normal()
sglang | File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
sglang | return func(*args, **kwargs)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 636, in event_loop_normal
sglang | batch = self.get_next_batch_to_run()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1297, in get_next_batch_to_run
sglang | new_batch = self.get_new_batch_prefill()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1458, in get_new_batch_prefill
sglang | new_batch.mix_with_running(self.running_batch)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1268, in mix_with_running
sglang | out_cache_loc = torch.cat([self.out_cache_loc, running_batch.out_cache_loc])
sglang | TypeError: expected Tensor as element 1 in argument 0, but got NoneType
sglang |
sglang | [2025-06-06 10:23:37] Received sigquit from a child process. It usually means the child failed.
sglang | [2025-06-06 10:23:37 TP5] Scheduler hit an exception: Traceback (most recent call last):
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2271, in run_scheduler_process
sglang | scheduler.event_loop_normal()
sglang | File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
sglang | return func(*args, **kwargs)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 636, in event_loop_normal
sglang | batch = self.get_next_batch_to_run()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1297, in get_next_batch_to_run
sglang | new_batch = self.get_new_batch_prefill()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1458, in get_new_batch_prefill
sglang | new_batch.mix_with_running(self.running_batch)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1268, in mix_with_running
sglang | out_cache_loc = torch.cat([self.out_cache_loc, running_batch.out_cache_loc])
sglang | TypeError: expected Tensor as element 1 in argument 0, but got NoneType
sglang |
sglang | [2025-06-06 10:23:37] Received sigquit from a child process. It usually means the child failed.
sglang | [2025-06-06 10:23:37 TP4] Scheduler hit an exception: Traceback (most recent call last):
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2271, in run_scheduler_process
sglang | scheduler.event_loop_normal()
sglang | File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
sglang | return func(*args, **kwargs)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 636, in event_loop_normal
sglang | batch = self.get_next_batch_to_run()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1297, in get_next_batch_to_run
sglang | new_batch = self.get_new_batch_prefill()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1458, in get_new_batch_prefill
sglang | new_batch.mix_with_running(self.running_batch)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1268, in mix_with_running
sglang | out_cache_loc = torch.cat([self.out_cache_loc, running_batch.out_cache_loc])
sglang | TypeError: expected Tensor as element 1 in argument 0, but got NoneType
sglang |
sglang | [2025-06-06 10:23:37 TP0] Scheduler hit an exception: Traceback (most recent call last):
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2271, in run_scheduler_process
sglang | scheduler.event_loop_normal()
sglang | File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
sglang | return func(*args, **kwargs)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 636, in event_loop_normal
sglang | batch = self.get_next_batch_to_run()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1297, in get_next_batch_to_run
sglang | new_batch = self.get_new_batch_prefill()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1458, in get_new_batch_prefill
sglang | new_batch.mix_with_running(self.running_batch)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1268, in mix_with_running
sglang | out_cache_loc = torch.cat([self.out_cache_loc, running_batch.out_cache_loc])
sglang | TypeError: expected Tensor as element 1 in argument 0, but got NoneType
sglang |
sglang | [2025-06-06 10:23:37] Received sigquit from a child process. It usually means the child failed.
sglang | [2025-06-06 10:23:37 TP7] Scheduler hit an exception: Traceback (most recent call last):
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2271, in run_scheduler_process
sglang | scheduler.event_loop_normal()
sglang | File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
sglang | return func(*args, **kwargs)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 636, in event_loop_normal
sglang | batch = self.get_next_batch_to_run()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1297, in get_next_batch_to_run
sglang | new_batch = self.get_new_batch_prefill()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1458, in get_new_batch_prefill
sglang | new_batch.mix_with_running(self.running_batch)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1268, in mix_with_running
sglang | out_cache_loc = torch.cat([self.out_cache_loc, running_batch.out_cache_loc])
sglang | TypeError: expected Tensor as element 1 in argument 0, but got NoneType
sglang |
sglang | [2025-06-06 10:23:37] Received sigquit from a child process. It usually means the child failed.
sglang | [2025-06-06 10:23:37 TP6] Scheduler hit an exception: Traceback (most recent call last):
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2271, in run_scheduler_process
sglang | scheduler.event_loop_normal()
sglang | File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
sglang | return func(*args, **kwargs)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 636, in event_loop_normal
sglang | batch = self.get_next_batch_to_run()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1297, in get_next_batch_to_run
sglang | new_batch = self.get_new_batch_prefill()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1458, in get_new_batch_prefill
sglang | new_batch.mix_with_running(self.running_batch)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1268, in mix_with_running
sglang | out_cache_loc = torch.cat([self.out_cache_loc, running_batch.out_cache_loc])
sglang | TypeError: expected Tensor as element 1 in argument 0, but got NoneType
sglang |
sglang | [2025-06-06 10:23:37] Received sigquit from a child process. It usually means the child failed.
sglang | [2025-06-06 10:23:37] ERROR: Traceback (most recent call last):
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 456, in wrapper
sglang | ret = self._cache[fun]
sglang | AttributeError: 'Process' object has no attribute '_cache'
sglang |
sglang | During handling of the above exception, another exception occurred:
sglang |
sglang | Traceback (most recent call last):
sglang | File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
sglang | return loop.run_until_complete(main)
sglang | File "uvloop/loop.pyx", line 1512, in uvloop.loop.Loop.run_until_complete
sglang | File "uvloop/loop.pyx", line 1505, in uvloop.loop.Loop.run_until_complete
sglang | File "uvloop/loop.pyx", line 1379, in uvloop.loop.Loop.run_forever
sglang | File "uvloop/loop.pyx", line 557, in uvloop.loop.Loop._run
sglang | File "uvloop/handles/poll.pyx", line 216, in uvloop.loop.__on_uvpoll_event
sglang | File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
sglang | File "uvloop/cbhandles.pyx", line 66, in uvloop.loop.Handle._run
sglang | File "uvloop/loop.pyx", line 399, in uvloop.loop.Loop._read_from_self
sglang | File "uvloop/loop.pyx", line 404, in uvloop.loop.Loop._invoke_signals
sglang | File "uvloop/loop.pyx", line 379, in uvloop.loop.Loop._ceval_process_signals
sglang | File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 511, in sigquit_handler
sglang | kill_process_tree(os.getpid())
sglang | File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 714, in kill_process_tree
sglang | children = itself.children(recursive=True)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 966, in children
sglang | ppid_map = _ppid_map()
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1621, in ppid_map
sglang | for pid in pids():
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1579, in pids
sglang | return [int(x) for x in os.listdir(path) if x.isdigit()]
sglang | File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 511, in sigquit_handler
sglang | kill_process_tree(os.getpid())
sglang | File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 714, in kill_process_tree
sglang | children = itself.children(recursive=True)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 989, in children
sglang | pid = stack.pop()
sglang | File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 511, in sigquit_handler
sglang | kill_process_tree(os.getpid())
sglang | File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 714, in kill_process_tree
sglang | children = itself.children(recursive=True)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 966, in children
sglang | ppid_map = _ppid_map()
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1623, in ppid_map
sglang | with open_binary(f"{procfs_path}/{pid}/stat") as f:
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 766, in open_binary
sglang | return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 511, in sigquit_handler
sglang | kill_process_tree(os.getpid())
sglang | File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 714, in kill_process_tree
sglang | children = itself.children(recursive=True)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 966, in children
sglang | ppid_map = _ppid_map()
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1623, in ppid_map
sglang | with open_binary(f"{procfs_path}/{pid}/stat") as f:
sglang | File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 508, in sigquit_handler
sglang | logger.error(
sglang | File "/usr/lib/python3.10/logging/__init__.py", line 1506, in error
sglang | self._log(ERROR, msg, args, **kwargs)
sglang | File "/usr/lib/python3.10/logging/__init__.py", line 1612, in _log
sglang | fn, lno, func, sinfo = self.findCaller(stack_info, stacklevel)
sglang | File "/usr/lib/python3.10/logging/__init__.py", line 1568, in findCaller
sglang | filename = os.path.normcase(co.co_filename)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 511, in sigquit_handler
sglang | kill_process_tree(os.getpid())
sglang | File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 714, in kill_process_tree
sglang | children = itself.children(recursive=True)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 966, in children
sglang | ppid_map = _ppid_map()
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1623, in ppid_map
sglang | with open_binary(f"{procfs_path}/{pid}/stat") as f:
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 766, in open_binary
sglang | return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 511, in sigquit_handler
sglang | kill_process_tree(os.getpid())
sglang | File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 714, in kill_process_tree
sglang | children = itself.children(recursive=True)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 998, in children
sglang | child = Process(child_pid)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 317, in __init__
sglang | self._init(pid)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 350, in _init
sglang | self._ident = self._get_ident()
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 390, in _get_ident
sglang | return (self.pid, self.create_time())
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 772, in create_time
sglang | self._create_time = self._proc.create_time()
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1646, in wrapper
sglang | return fun(self, *args, **kwargs)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1884, in create_time
sglang | ctime = float(self._parse_stat_file()['create_time'])
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1646, in wrapper
sglang | return fun(self, *args, **kwargs)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 460, in wrapper
sglang | return fun(self)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1712, in _parse_stat_file
sglang | data = bcat(f"{self._procfs_path}/{self.pid}/stat")
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 814, in bcat
sglang | return cat(fname, fallback=fallback, _open=open_binary)
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 802, in cat
sglang | with _open(fname) as f:
sglang | File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 766, in open_binary
sglang | return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
sglang | File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 511, in sigquit_handler
sglang | kill_process_tree(os.getpid())
sglang | File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 727, in kill_process_tree
sglang | sys.exit(0)
sglang | SystemExit: 0
sglang |
Reproduction
I'm using docker lmsysorg/sglang:latest
entrypoint: python3 -m sglang.launch_server
command: >
--model-path ${SGLANG_MODEL_PATH}
--tp 8
--trust-remote-code
--speculative-draft-model-path lmsys/DeepSeek-V3-NextN
--speculative-algorithm EAGLE
--speculative-num-steps 2
--speculative-eagle-topk 2
--speculative-num-draft-tokens 4
--cuda-graph-bs 1 2 4 8 16 32 40 48 56 64 128
--max-running-requests 128
--enable-metrics
--host 0.0.0.0
--port 3000
--enable-mixed-chunk
Environment
Python: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H200
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 570.124.06
PyTorch: 2.6.0+cu124
sglang: 0.4.6.post4
sgl_kernel: 0.1.2.post1
flashinfer_python: 0.2.5+cu124torch2.6
triton: 3.2.0
transformers: 4.51.1
torchao: 0.11.0
numpy: 2.2.5
aiohttp: 3.11.18
fastapi: 0.115.12
hf_transfer: 0.1.9
huggingface_hub: 0.31.1
interegular: 0.3.3
modelscope: 1.25.0
orjson: 3.10.18
outlines: 0.1.11
packaging: 25.0
psutil: 7.0.0
pydantic: 2.11.4
python-multipart: 0.0.20
pyzmq: 26.4.0
uvicorn: 0.34.2
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.19
openai: 1.75.0
tiktoken: 0.9.0
anthropic: 0.51.0
litellm: 1.69.1
decord: 0.6.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 0-191 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 0-191 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 0-191 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 0-191 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 0-191 0 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 0-191 0 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 0-191 0 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X 0-191 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Hypervisor vendor: KVM
ulimit soft: 1048576