Releases · flash-algo/flash-sparse-attention · GitHub

09 Nov 15:55

LoserCheems

v1.2.3 Latest

Latest

What's Changed

Add selectable masking strategies for attention by @LoserCheems in #204
Refactor attention block smoothing for consistency by @LoserCheems in #205
Optimize triton version: GQA, mask/bias broadcasting, skip inactive tiles, and stability fixes by @LoserCheems in #200
[FEATURE SUPPORT] Triton special compact dynamic-mask attention: 1.6× faster fwd+bwd, numerically equivalent by @LoserCheems in #206
Fix documentation and references for Flash Sparse Attention by @LoserCheems in #207

Full Changelog: v1.2.2...v1.2.3

What's Changed

Add selectable masking strategies for attention by @LoserCheems in #204
Refactor attention block smoothing for consistency by @LoserCheems in #205
Optimize triton version: GQA, mask/bias broadcasting, skip inactive tiles, and stability fixes by @LoserCheems in #200
[FEATURE SUPPORT] Triton special compact dynamic-mask attention: 1.6× faster fwd+bwd, numerically equivalent by @LoserCheems in #206
Fix documentation and references for Flash Sparse Attention by @LoserCheems in #207

Full Changelog: v1.2.2...v1.2.3

Contributors

LoserCheems

Assets 67

flash_sparse_attn-1.2.3+sm100cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

sha256:59c4878eda85bd9a8cdec2d2ff6ce2c877972ae9109a0c4d430466ee039dfff0

81.4 MB 2025-11-10T07:29:17Z
flash_sparse_attn-1.2.3+sm100cu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl

sha256:08eccfb9d18da37bd5f712453e8d5a0d6cbf98c6c34a99abbb359b867bdaef00

81.4 MB 2025-11-10T23:47:43Z
flash_sparse_attn-1.2.3+sm100cu12torch2.5cxx11abiFALSE-cp312-cp312-linux_x86_64.whl

sha256:077ae8d04912c4f5d6e5a3e5d35b69b964ed1c89bc951e57ca92c5064e7a7fb8

81.4 MB 2025-11-10T02:16:08Z
flash_sparse_attn-1.2.3+sm100cu12torch2.5cxx11abiFALSE-cp39-cp39-linux_x86_64.whl

sha256:2858bfecd48cf3cc10df97599cad11e8b3dcf5281ff9a90be7afb51f03d9b3d4

81.4 MB 2025-11-10T18:34:17Z
flash_sparse_attn-1.2.3+sm100cu12torch2.5cxx11abiTRUE-cp310-cp310-linux_x86_64.whl

sha256:45843546f617031be297a57f32b1d63022c26a4aa03b23df7b35da606ee2dc5b

81.4 MB 2025-11-10T02:28:54Z
flash_sparse_attn-1.2.3+sm100cu12torch2.5cxx11abiTRUE-cp311-cp311-linux_x86_64.whl

sha256:ec9ed64cf2f483098803003a197582426afe1b6202bf4f1952bc238456a31dab

81.4 MB 2025-11-10T02:45:05Z
flash_sparse_attn-1.2.3+sm100cu12torch2.5cxx11abiTRUE-cp312-cp312-linux_x86_64.whl

sha256:ae674b617d97432eccef3db8aa6edc3e2162f28c9d104605017a3ddbe3937f3b

81.4 MB 2025-11-09T20:36:53Z
flash_sparse_attn-1.2.3+sm100cu12torch2.5cxx11abiTRUE-cp39-cp39-linux_x86_64.whl

sha256:e20060f34278010ae9c7c90e81599728b66b0b06242900e4de2e724638b4f9c9

81.4 MB 2025-11-10T07:13:43Z
flash_sparse_attn-1.2.3+sm100cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

sha256:3876e6b16c30e5a0615f38c47f512b56f1d9466628f118c44089822af360442e

81.4 MB 2025-11-10T19:24:37Z
flash_sparse_attn-1.2.3+sm100cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl

sha256:17f95f44853a08fee6fb58e10145ccae0e6cb9fff7b2ef96f8e2bf14976e867f

81.5 MB 2025-11-10T08:43:02Z
Source code (zip)

2025-11-09T15:54:58Z
Source code (tar.gz)

2025-11-09T15:54:58Z

05 Nov 08:10

LoserCheems

v1.2.2

What's Changed

[FEATURE SUPPORT] Robust dBias accumulation for seqlen_q_bias == 1 by @LoserCheems in #194
[FEATURE SUPPORT] Centralize dynamic mask creation for FDMA by @LoserCheems in #197
Update documentation to use mask utility in examples by @LoserCheems in #198
Fix attention bias calculation and dbias handling by @LoserCheems in #199
Add block-wise smoothing to attention mask by @LoserCheems in #201
[FEATURE SUPPORT] Move scaling out of streaming loops, bias-initialized acc_s, and fix dQ double-scaling by @LoserCheems in #203

Full Changelog: v1.2.1...v1.2.2

Contributors

LoserCheems

Assets 51

16 Oct 04:51

LoserCheems

v1.2.1

What's Changed

Implement variable-length attention with mask and bias support by @LoserCheems in #185
Add issue/PR templates by @LoserCheems in #186
[FEATURE SUPPORT] Variable-Length Attention with Padding-Free Execution by @LoserCheems in #188
[FEATURE SUPPORT] Broadcastable 4D mask/bias, 128‑rounded key length, stride‑0 broadcasting, and dbias reductions by @LoserCheems in #190
Refactor bias initialization and enhance bias computation in FlashDMAttnFunc by @LoserCheems in #191
Fix attention_mask and attention_bias shape descriptions and remove redundant checks by @LoserCheems in #192
Enhance bias gradient accumulation in backward pass by @LoserCheems in #193

Full Changelog: v1.2.0...v1.2.1

Contributors

LoserCheems

Assets 87

01 Oct 16:58

LoserCheems

v1.2.0

What's Changed

[BUG FIX] Fix mask/bias memory access and vectorization issues in kernels by @LoserCheems in #182

Full Changelog: v1.1.9...v1.2.0

Contributors

LoserCheems

Assets 82

22 Sep 16:19

LoserCheems

v1.1.9

What's Changed

Refactor attention mask and bias handling for efficiency by @LoserCheems in #177
[BUG FIX] SM80 NaN in bias.grad when both mask and bias are enabled by @LoserCheems in #179

Full Changelog: v1.1.8...v1.1.9

Contributors

LoserCheems

Assets 94

21 Sep 02:05

LoserCheems

v1.1.8

What's Changed

Bump version to 1.1.8 by @LoserCheems in #176

Full Changelog: v1.1.7...v1.1.8

Contributors

LoserCheems

Assets 97

20 Sep 18:30

LoserCheems

v1.1.7

What's Changed

Increase GitHub Actions build timeout to 6 hours by @LoserCheems in #175

Full Changelog: v1.1.6...v1.1.7

Contributors

LoserCheems

Assets 22

20 Sep 12:41

LoserCheems

v1.1.6

What's Changed

Remove CUDA architecture '120' for compatibility by @LoserCheems in #174

Full Changelog: v1.1.5...v1.1.6

Contributors

LoserCheems

Assets 6

20 Sep 12:37

LoserCheems

v1.1.5

What's Changed

Expand build matrix for ARM64 and additional CUDA architectures by @LoserCheems in #173

Full Changelog: v1.1.4...v1.1.5

Contributors

LoserCheems

Assets 2

20 Sep 12:33

LoserCheems

v1.1.4

What's Changed

Refine build matrix and CUDA architecture specifications by @LoserCheems in #172

Full Changelog: v1.1.1...v1.1.4

Contributors

LoserCheems

Assets 2