Reminder
System Info
ktransformers @ bb15fdf
kvcache-ai/sglang @ 2763727
Reproduction
[2026-05-15 19:00:54] Load weight end. elapsed=152.31 s, type=KimiK25ForConditionalGeneration, dtype=torch.bfloat16, avail mem=52.58 GB, mem usage=41.59 GB.
[2026-05-15 19:00:54] Using KV cache dtype: torch.bfloat16
[2026-05-15 19:00:54] KV Cache is allocated. #tokens: 260000, KV size: 17.02 GB
[2026-05-15 19:00:54] Memory pool end. avail mem=35.54 GB
[2026-05-15 19:00:54] Capture cuda graph begin. This can take up to several minutes. avail mem=35.10 GB
[2026-05-15 19:00:54] Capture cuda graph bs [1]
[2026-05-15 19:00:54] Scheduler hit an exception: Traceback (most recent call last):
File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 3207, in run_scheduler_process
scheduler = Scheduler(
^^^^^^^^^^
File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 367, in __init__
self.init_model_worker()
File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 563, in init_model_worker
self.init_tp_model_worker()
File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 521, in init_tp_model_worker
self.tp_worker = TpModelWorker(
^^^^^^^^^^^^^^
File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 247, in __init__
self._init_model_runner()
File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 330, in _init_model_runner
self._model_runner = ModelRunner(
^^^^^^^^^^^^
File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 418, in __init__
self.initialize()
File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 633, in initialize
self.init_device_graphs()
File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 2211, in init_device_graphs
self.graph_runner = graph_runners[self.device](self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 562, in __init__
self.model_runner.model.set_eagle3_layers_to_capture()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kt-test/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1964, in __getattr__
raise AttributeError(
AttributeError: 'KimiK25ForConditionalGeneration' object has no attribute 'set_eagle3_layers_to_capture'
[2026-05-15 19:00:54] Received sigquit from a child process. It usually means the child failed.
test.sh: line 38: 1787672 Killed python -m sglang.launch_server --host 0.0.0.0 --port 31245 --model /mnt/aux/aux/model/Kimi-K2.6/moonshotai --kt-weight-path /mnt/aux/aux/model/Kimi-K2.6/moonshotai --kt-cpuinfer 96 --kt-threadpool-count 8 --kt-num-gpu-experts 12 --kt-method RAWINT4 --kt-max-deferred-experts-per-token 2 --kt-gpu-prefill-token-threshold 1200 --kt-enable-dynamic-expert-update --attention-backend flashinfer --trust-remote-code --mem-fraction-static 0.94 --context-length 200000 --max-running-requests 1 --prefill-max-requests 1 --max-total-tokens 200000 --enable-mixed-chunk --served-model-name Kimi-K2.6 --enable-p2p-check --disable-shared-experts-fusion --chunked-prefill-size 32768 --tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --enable-hierarchical-cache --hicache-ratio 2 --hicache-size 0 --skip-server-warmup --speculative-algorithm EAGLE3 --speculative-draft-model-path /mnt/aux/aux/model/Kimi-K26-eagle3/AQ-MedAI --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --sleep-on-idle
Others
I'd like to use/try EAGLE3 w/Kimi, e.g. the EAGLE3 model provided by AQ-MedAI.
Their model-card indicates it should work with sglang 0.5.10 (maybe others, it doesn't say it's a minimum nor maximum)
Expected: Works.
Observed: Does not work. (See trace above)
I tried transformers 5.8.0, 5.7.0, and 5.6.2, although the problem appears to be in the model implementation in sglang/src/models/kimi_k25 .
Additionally, since kt vendors the sglang module, I can't file any report there.
Additionally2, there is little clarity wrt. differences between mainline sglang and kt sglang. There are pulls/commits that "merge" from mainline but it's still vague in terms of what to expect.
Reminder
System Info
ktransformers @ bb15fdf
kvcache-ai/sglang @ 2763727
Reproduction
Others
I'd like to use/try EAGLE3 w/Kimi, e.g. the EAGLE3 model provided by AQ-MedAI.
Their model-card indicates it should work with sglang 0.5.10 (maybe others, it doesn't say it's a minimum nor maximum)
Expected: Works.
Observed: Does not work. (See trace above)
I tried transformers 5.8.0, 5.7.0, and 5.6.2, although the problem appears to be in the model implementation in sglang/src/models/kimi_k25 .
Additionally, since kt vendors the sglang module, I can't file any report there.
Additionally2, there is little clarity wrt. differences between mainline sglang and kt sglang. There are pulls/commits that "merge" from mainline but it's still vague in terms of what to expect.