-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer #20059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
LucasWilkinson
merged 129 commits into
vllm-project:main
from
fhl2000:full_cudagraph_FA2_FlashInfer
Aug 15, 2025
+1,840
−598
Merged
Changes from all commits
Commits
Show all changes
129 commits
Select commit
Hold shift + click to select a range
92b1733
FA2 and FlashInfer Full cuda graph support
fhl2000 58ce477
fix the arch support in CMakeLists.txt to include 8.9
fhl2000 c2c5fea
Refactors
fhl2000 1606880
refactors
fhl2000 806432a
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 7c5df45
refactor
fhl2000 c7a9424
Add check for separate_attention_routine flag
fhl2000 e8b9296
fix typo error
fhl2000 94d0b79
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 a67c698
refactors and rearchitect cuda graph logic
fhl2000 da110af
Refactors
fhl2000 deaf0fe
Delect one commit
fhl2000 02ca154
Add support for force_no_split_graph
fhl2000 fa0d25c
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 5108bef
Huge refactors to separete cudagraph logic from vllm compilation
fhl2000 1c1873d
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 7d4667a
refactors
fhl2000 fedff47
fix errors
fhl2000 833ac56
fix small error by lazy import
fhl2000 d57257d
handle lint-and-deploy errors for cpu execution
fhl2000 8b7ea7a
remove redundents
fhl2000 328615d
Clear
fhl2000 debc682
Big refactors
fhl2000 cad6c39
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 dc455ee
cleanup
fhl2000 620a728
fix warmup
fhl2000 b1e6978
Commit suggestion: Update vllm/config.py
fhl2000 beee69a
commit suggestion2: Update vllm/config.py
fhl2000 21b1a8d
fix enforce_eager
fhl2000 ec79af7
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 210359a
small cleanup for pre-commit
fhl2000 11263e0
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 9a38a4e
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 699aff3
refactors
fhl2000 ef3d9d9
resolve yapf conflicts with isort
fhl2000 658565e
fixes
fhl2000 15e2b4a
fix global graph pool issue
fhl2000 4253dbf
fix refactors
fhl2000 2783e26
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 1b54962
more refactors
fhl2000 fb2a3c7
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 d6269bd
refactors for and more
fhl2000 2e1304c
fix pre-commit
fhl2000 db22ca5
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 72d40e6
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 0c79e53
change cudagraph dispatching logics; runtime style->runtime mode
fhl2000 75db3a6
pass pre-commit
fhl2000 0bca4c4
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 9d2f148
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 60bdc61
fix bug when cudagraph_separate_routine==False
fhl2000 9036bd2
recover FlashInfer from main branch
fhl2000 89ec3aa
address comments and clean up
fhl2000 4b991a3
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 614f6ea
clean up
fhl2000 c049627
fix
fhl2000 e69e488
add tests; more docs
fhl2000 835086a
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 534410e
clean up
fhl2000 618f7c0
small fix
fhl2000 1b343eb
add more docs
fhl2000 532f245
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 431a726
simplify the logic
fhl2000 19faeda
fix CI failures
fhl2000 348a117
fix CI failures again
fhl2000 fc5e37a
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 4d9829f
fix pre-commit
fhl2000 7773608
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 a692bb6
fix CI
fhl2000 543f264
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 3e5959a
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 aa35551
fix errors;move default initialization of cudagraph_mode to __post_in…
fhl2000 bad2710
fix a potential bug
fhl2000 f175c16
Merge branch 'vllm-project:main' into full_cudagraph_FA2_FlashInfer
fhl2000 9916a75
Merge remote-tracking branch 'origin/main' into pr-20059
LucasWilkinson 81d7561
wip rework cudagraph_mode
LucasWilkinson 0137d84
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 1bfb855
fix and re-enable FlashInfer full cudagraph
fhl2000 24c40ab
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 95d94f8
fix some CI tests
fhl2000 e7763ef
fallback
LucasWilkinson 803a185
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson 645accf
warn perferred
LucasWilkinson 5029a6a
fix bugs and some refactors;temporarily add FULL_DOUBLE mode
fhl2000 fef7eee
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 e796196
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson 816024e
fix incorrectly infering type from CUDAGraphWrapper
fhl2000 651f729
fix and refactor cudagraph_mode checkings
fhl2000 38ddeaf
remove full double
LucasWilkinson 9ca04ed
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 a7adfae
Merge remote-tracking branch 'origin/main' into fhl2000/full_cudagrap…
LucasWilkinson 14e83f5
Merge branch 'fhl2000_full_cudagraph_FA2_FlashInfer_merge' into full_…
LucasWilkinson 9cc6b93
fix
LucasWilkinson 1e97920
fix
LucasWilkinson 25b6242
cleanup
LucasWilkinson 85f20bf
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 766eb7c
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson b0374be
deprecation
LucasWilkinson a160dd4
migrate flags
LucasWilkinson c2dc791
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson 028f119
cleanup
LucasWilkinson 43db16d
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson 2cad036
fix some unit tests
LucasWilkinson 6839e88
more cleanup
LucasWilkinson 04ed99a
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson 3f2b279
fix more unit tests
LucasWilkinson d500150
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
fhl2000 a56d549
fix is_attention_splitting;fix new mamba_attn cg support
fhl2000 3499d7b
wip
LucasWilkinson 83d4e7c
stabalize unit test
LucasWilkinson bf8a51d
cleanup
LucasWilkinson d1f62e4
unit test fix
LucasWilkinson 7e19ca4
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson c722f2c
refactor
LucasWilkinson 1937615
remove accidentally committed file
LucasWilkinson ce9cc82
fix XPU tests
LucasWilkinson f3561f9
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson 9d6b189
fix xpu
LucasWilkinson 3c4b532
match HPU cudagraph handling + down grade log
LucasWilkinson bed9576
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson 19f7447
fix xpu
LucasWilkinson 0122313
unit test fixes
LucasWilkinson 3a2041b
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson 641b10b
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson 974c707
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
LucasWilkinson 3805f3f
Apply suggestions from code review
LucasWilkinson f2c437a
review comments
LucasWilkinson 1ff41d8
Update vllm/v1/worker/gpu_model_runner.py
LucasWilkinson af2a38c
Merge remote-tracking branch 'origin/main' into full_cudagraph_FA2_Fl…
fhl2000 f751e50
Merge branch 'main' into full_cudagraph_FA2_FlashInfer
fhl2000 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.