Support Eagle cuda graph for Triton backend #3500

ispobock · 2025-02-11T17:51:01Z

Motivation

Follow-up of #3466.

# Triton backend Eagle
python3 -m sglang.launch_server --model meta-llama/Llama-2-7b-chat-hf  --speculative-algo EAGLE --speculative-draft lmzheng/sglang-EAGLE-llama2-chat-7B --speculative-num-steps 5 --speculative-eagle-topk 8 --speculative-num-draft-tokens 64 --mem-fraction 0.8 --port 33333 --disable-radix --attention-backend triton

speed: 337.11 token/s

ispobock added 2 commits February 10, 2025 16:56

init

027e9d1

triton eagle cuda graph

3acce9b

ispobock requested review from Ying1123, merrymercy and zhyncs as code owners February 11, 2025 17:51

Merge branch 'main' into eagle-triton-cg

1a7499b

zhyncs approved these changes Feb 11, 2025

View reviewed changes

update test

f35d86b

zhyncs merged commit 7e6d5fc into sgl-project:main Feb 11, 2025
19 checks passed

ispobock mentioned this pull request Feb 11, 2025

[Track] DeepSeek V3/R1 nextn progress #3472

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Eagle cuda graph for Triton backend #3500

Support Eagle cuda graph for Triton backend #3500

Uh oh!

ispobock commented Feb 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support Eagle cuda graph for Triton backend #3500

Support Eagle cuda graph for Triton backend #3500

Uh oh!

Conversation

ispobock commented Feb 11, 2025

Motivation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants