Skip to content

EAGLE3.1 Support#568

Open
bluecoffee8 wants to merge 5 commits into
sgl-project:mainfrom
bluecoffee8:eagle3.1
Open

EAGLE3.1 Support#568
bluecoffee8 wants to merge 5 commits into
sgl-project:mainfrom
bluecoffee8:eagle3.1

Conversation

@bluecoffee8
Copy link
Copy Markdown

@bluecoffee8 bluecoffee8 commented May 27, 2026

Motivation

EAGLE3.1 support, based on https://github.com/lightseekorg/TorchSpec/pull/97 which was added to torchspec.

Validation: trained Qwen3-30B-A3B-Instruct-2507 draft model (EAGLE3 vs EAGLE3.1) on a single epoch of sharegpt dataset, based on https://github.com/sgl-project/SpecForge/blob/main/examples/run_qwen3_30b_a3b_eagle3_online.sh.

Modifications

Add eagle3.1 features (draft model output norm, fc norm on target hidden states).
Added example eagle3.1 config for qwen3-30b-a3b model, and script.

Related Issues

Accuracy Test

Benchmark & Profiling

Ran benchmarks for each.

Server launch (1 x H100):

SGLANG_ENABLE_SPEC_V2=1 && SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 && python -m sglang.launch_server --host 0.0.0.0 --port 8001 --model-path Qwen/Qwen3-30B-A3B-Instruct-2507 --speculative-algorithm EAGLE3 --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --speculative-draft-model-path /path/to/draft_model

Client launch:

python3 -m sglang.bench_serving --backend sglang --host 0.0.0.0 --port 8001 --dataset-name random-ids --warmup-requests 80 --num-prompts 128 --random-input-len 1024 --random-output-len 256 --random-range-ratio 1.0 --request-rate 1.0 

Results:
EAGLE3

img_v3_02125_ff5f39c2-f740-4b5d-8083-a4ab77b98a6h

EAGLE3.1

img_v3_02125_873e5d0d-054c-46d2-a52b-d89d67dc73ch

Comparison:

Acc length 1.68 => 2.30, +36%
p50 e2e, 1550.76 => 823.57, -47%
p50 tpot, 5.75 => 2.94, -49%

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@bluecoffee8 bluecoffee8 mentioned this pull request May 27, 2026
2 tasks
@bluecoffee8 bluecoffee8 marked this pull request as ready for review May 28, 2026 20:00
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Comment thread specforge/core/eagle3.py

# Step 5.4: get logits
logits = self.draft_model.compute_logits(hidden_states)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_logits already applies norm to the hidden_states. We should gate it if the hidden states is already normed.


def project_hidden_states(self, hidden_states: torch.Tensor) -> torch.Tensor:
# eagle 3 requires hidden states from 3 layers
assert hidden_states.size(-1) == self.config.hidden_size * 3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep assertion and assert to self.num_aux_hidden_states.

Comment thread specforge/core/eagle3.py Outdated
Comment on lines +252 to +254
# Apply output norm for EAGLE 3.1 post-norm architecture
if self.draft_model.norm_output:
hidden_states = self.draft_model.norm(hidden_states)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly,

I think it makes sense to calculate using hidden_states, hidden_states_for_logits = get_hidden_states(...) as a pair, where the right one is normed and left is not by default. The method can return both normed based on norm_output.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dogacel i just pushed a commit that should address your comments. could you plz take a look thanks!

Copy link
Copy Markdown

@Dogacel Dogacel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants