vllm-project / flash-attention Public

forked from Dao-AILab/flash-attention

Notifications You must be signed in to change notification settings
Fork 76
Star 87

Code
Pull requests 11
Actions
Security
Insights

Additional navigation options

Code
Pull requests
Actions
Security
Insights

Pull requests: vllm-project/flash-attention

Labels 9 Milestones 0

New pull request New

Clear current search query, filters, and sorts

11 Open 68 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Pass s_aux through flash_attn_with_kvcache

#79 by tdoublep was merged Aug 8, 2025

Loading…

Attention Sinks Perf Boost

#78 by LucasWilkinson was merged Aug 9, 2025

Loading…

[WIP] Attention Sinks Perf Boost

#77 by LucasWilkinson was closed Aug 7, 2025 • Draft

[BugFix] Fix FA2 RuntimeError when sinks is provided

#76 by LucasWilkinson was merged Aug 6, 2025

Loading…

Support FA3 Attention Sink

#75 by zyongye was merged Aug 5, 2025

Loading…

cmake: get rid of empty VLLM_FA_GPU_ARCHES variable

#74 by dtrifiro was merged Jul 31, 2025

Loading…

Sparse attention : Generalize arch checks for A100 and above

#73 by ExtReMLapin was merged Jul 28, 2025

Loading…

[Misc] Add num_splits input arg to flash_attn_varlen_func

#72 by WoosukKwon was merged Jul 1, 2025

Loading…

vllm_flash_attn: Setup for vllm_kernels package

#71 by seemethere was merged Jun 23, 2025

Loading…

varlen combine scheduler

#70 by LucasWilkinson was merged Jun 16, 2025

Loading…

FA2 8.0 PTX

#69 by LucasWilkinson was merged Jun 16, 2025

Loading…

how are you supposed to run tests?

#68 by foolusion was closed May 22, 2025

Loading…

[BugFix] Fix raising exception when FA3 isn't available

#66 by LucasWilkinson was merged Apr 24, 2025

Loading…

Add rotary triton operator to vllm_flash_attn

#64 by cynthieye was merged Apr 24, 2025

Loading…

FA3 Decode Perf - Use single mma warp group for decode batches

#63 by LucasWilkinson was merged Apr 21, 2025

Loading…

[WIP] Use single mma warp group for decode batches

#62 by LucasWilkinson was closed Apr 18, 2025 • Draft

Upstream Sync - up to: d836a6bf09bf3838c6e71c9cf675b3708fea0d71

#61 by LucasWilkinson was merged Apr 10, 2025

Loading…

Sparse attention window size bug fix

#60 by mklasby was merged Apr 12, 2025

Loading…

[Easy] replace c10::optional with std::optional

#58 by yeqcharlotte was merged Mar 27, 2025

Loading…

Fix missing import in __init__

#57 by LucasWilkinson was merged Mar 25, 2025

Loading…

Upstream Sync - up to: 27f501dbe011f4371bff938fe7e09311ab3002fa

#56 by LucasWilkinson was merged Mar 20, 2025

Loading…

Avoid selecting fav3 for Blackwell

#55 by kushanam was merged Mar 5, 2025

Loading…

Fix building on CUDA 12.1

#53 by LucasWilkinson was merged Feb 27, 2025

Loading…

Upstream Sync | up to 06e34f62d18d3a721bc515d4b331a46d5d4c8c09

#52 by LucasWilkinson was merged Feb 26, 2025

Loading…

adding preliminary Blackwell support

#51 by kushanam was merged Mar 4, 2025

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!