Speculative/EAGLE - Supported?

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info

ktransformers @ bb15fdf
kvcache-ai/sglang @ 2763727


### Reproduction

```
[2026-05-15 19:00:54] Load weight end. elapsed=152.31 s, type=KimiK25ForConditionalGeneration, dtype=torch.bfloat16, avail mem=52.58 GB, mem usage=41.59 GB.
[2026-05-15 19:00:54] Using KV cache dtype: torch.bfloat16
[2026-05-15 19:00:54] KV Cache is allocated. #tokens: 260000, KV size: 17.02 GB
[2026-05-15 19:00:54] Memory pool end. avail mem=35.54 GB
[2026-05-15 19:00:54] Capture cuda graph begin. This can take up to several minutes. avail mem=35.10 GB
[2026-05-15 19:00:54] Capture cuda graph bs [1]
[2026-05-15 19:00:54] Scheduler hit an exception: Traceback (most recent call last):
  File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 3207, in run_scheduler_process
    scheduler = Scheduler(
                ^^^^^^^^^^
  File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 367, in __init__
    self.init_model_worker()
  File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 563, in init_model_worker
    self.init_tp_model_worker()
  File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 521, in init_tp_model_worker
    self.tp_worker = TpModelWorker(
                     ^^^^^^^^^^^^^^
  File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 247, in __init__
    self._init_model_runner()
  File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 330, in _init_model_runner
    self._model_runner = ModelRunner(
                         ^^^^^^^^^^^^
  File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 418, in __init__
    self.initialize()
  File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 633, in initialize
    self.init_device_graphs()
  File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 2211, in init_device_graphs
    self.graph_runner = graph_runners[self.device](self)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kt-test/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 562, in __init__
    self.model_runner.model.set_eagle3_layers_to_capture()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kt-test/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1964, in __getattr__
    raise AttributeError(
AttributeError: 'KimiK25ForConditionalGeneration' object has no attribute 'set_eagle3_layers_to_capture'

[2026-05-15 19:00:54] Received sigquit from a child process. It usually means the child failed.
test.sh: line 38: 1787672 Killed                  python -m sglang.launch_server --host 0.0.0.0 --port 31245 --model /mnt/aux/aux/model/Kimi-K2.6/moonshotai --kt-weight-path /mnt/aux/aux/model/Kimi-K2.6/moonshotai --kt-cpuinfer 96 --kt-threadpool-count 8 --kt-num-gpu-experts 12 --kt-method RAWINT4 --kt-max-deferred-experts-per-token 2 --kt-gpu-prefill-token-threshold 1200 --kt-enable-dynamic-expert-update --attention-backend flashinfer --trust-remote-code --mem-fraction-static 0.94 --context-length 200000 --max-running-requests 1 --prefill-max-requests 1 --max-total-tokens 200000 --enable-mixed-chunk --served-model-name Kimi-K2.6 --enable-p2p-check --disable-shared-experts-fusion --chunked-prefill-size 32768 --tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --enable-hierarchical-cache --hicache-ratio 2 --hicache-size 0 --skip-server-warmup --speculative-algorithm EAGLE3 --speculative-draft-model-path /mnt/aux/aux/model/Kimi-K26-eagle3/AQ-MedAI --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --sleep-on-idle
```

### Others

I'd like to use/try EAGLE3 w/Kimi, e.g. the EAGLE3 model provided by [AQ-MedAI](https://huggingface.co/AQ-MedAI/Kimi-K26-eagle3).

Their model-card indicates it should work with sglang 0.5.10 (maybe others, it doesn't say it's a minimum nor maximum)

Expected: Works.
Observed: Does not work. (See trace above)

I tried transformers 5.8.0, 5.7.0, and 5.6.2, although the problem appears to be in the model implementation in sglang/src/models/kimi_k25 .

Additionally, since kt vendors the sglang module, I can't file any report there.

Additionally2, there is little clarity wrt. differences between _mainline_ sglang and _kt_ sglang. There are pulls/commits that "merge" from mainline but it's still vague in terms of what to expect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speculative/EAGLE - Supported? #2007

Reminder

System Info

Reproduction

Others

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Speculative/EAGLE - Supported? #2007

Description

Reminder

System Info

Reproduction

Others

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions