EAGLE3.1 Support#568
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
|
||
| # Step 5.4: get logits | ||
| logits = self.draft_model.compute_logits(hidden_states) | ||
|
|
There was a problem hiding this comment.
compute_logits already applies norm to the hidden_states. We should gate it if the hidden states is already normed.
|
|
||
| def project_hidden_states(self, hidden_states: torch.Tensor) -> torch.Tensor: | ||
| # eagle 3 requires hidden states from 3 layers | ||
| assert hidden_states.size(-1) == self.config.hidden_size * 3 |
There was a problem hiding this comment.
Let's keep assertion and assert to self.num_aux_hidden_states.
| # Apply output norm for EAGLE 3.1 post-norm architecture | ||
| if self.draft_model.norm_output: | ||
| hidden_states = self.draft_model.norm(hidden_states) |
There was a problem hiding this comment.
Similarly,
I think it makes sense to calculate using hidden_states, hidden_states_for_logits = get_hidden_states(...) as a pair, where the right one is normed and left is not by default. The method can return both normed based on norm_output.
There was a problem hiding this comment.
@Dogacel i just pushed a commit that should address your comments. could you plz take a look thanks!
Motivation
EAGLE3.1 support, based on https://github.com/lightseekorg/TorchSpec/pull/97 which was added to torchspec.
Validation: trained Qwen3-30B-A3B-Instruct-2507 draft model (EAGLE3 vs EAGLE3.1) on a single epoch of sharegpt dataset, based on https://github.com/sgl-project/SpecForge/blob/main/examples/run_qwen3_30b_a3b_eagle3_online.sh.
Modifications
Add eagle3.1 features (draft model output norm, fc norm on target hidden states).
Added example eagle3.1 config for qwen3-30b-a3b model, and script.
Related Issues
Accuracy Test
Benchmark & Profiling
Ran benchmarks for each.
Server launch (1 x H100):
Client launch:
Results:
EAGLE3
EAGLE3.1
Comparison:
Acc length 1.68 => 2.30, +36%
p50 e2e, 1550.76 => 823.57, -47%
p50 tpot, 5.75 => 2.94, -49%
Checklist