Skip to content

Pull requests: vllm-project/flash-attention

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Pass s_aux through flash_attn_with_kvcache
#79 by tdoublep was merged Aug 8, 2025 Loading…
Attention Sinks Perf Boost
#78 by LucasWilkinson was merged Aug 9, 2025 Loading…
Support FA3 Attention Sink
#75 by zyongye was merged Aug 5, 2025 Loading…
cmake: get rid of empty VLLM_FA_GPU_ARCHES variable
#74 by dtrifiro was merged Jul 31, 2025 Loading…
vllm_flash_attn: Setup for vllm_kernels package
#71 by seemethere was merged Jun 23, 2025 Loading…
varlen combine scheduler
#70 by LucasWilkinson was merged Jun 16, 2025 Loading…
FA2 8.0 PTX
#69 by LucasWilkinson was merged Jun 16, 2025 Loading…
how are you supposed to run tests?
#68 by foolusion was closed May 22, 2025 Loading…
Add rotary triton operator to vllm_flash_attn
#64 by cynthieye was merged Apr 24, 2025 Loading…
Sparse attention window size bug fix
#60 by mklasby was merged Apr 12, 2025 Loading…
[Easy] replace c10::optional with std::optional
#58 by yeqcharlotte was merged Mar 27, 2025 Loading…
Fix missing import in __init__
#57 by LucasWilkinson was merged Mar 25, 2025 Loading…
Avoid selecting fav3 for Blackwell
#55 by kushanam was merged Mar 5, 2025 Loading…
Fix building on CUDA 12.1
#53 by LucasWilkinson was merged Feb 27, 2025 Loading…
adding preliminary Blackwell support
#51 by kushanam was merged Mar 4, 2025 Loading…
ProTip! no:milestone will show everything without a milestone.