Releases: flash-algo/flash-sparse-attention
Releases · flash-algo/flash-sparse-attention
v1.2.3
What's Changed
- Add selectable masking strategies for attention by @LoserCheems in #204
- Refactor attention block smoothing for consistency by @LoserCheems in #205
- Optimize triton version: GQA, mask/bias broadcasting, skip inactive tiles, and stability fixes by @LoserCheems in #200
- [FEATURE SUPPORT] Triton special compact dynamic-mask attention: 1.6× faster fwd+bwd, numerically equivalent by @LoserCheems in #206
- Fix documentation and references for Flash Sparse Attention by @LoserCheems in #207
Full Changelog: v1.2.2...v1.2.3
What's Changed
- Add selectable masking strategies for attention by @LoserCheems in #204
- Refactor attention block smoothing for consistency by @LoserCheems in #205
- Optimize triton version: GQA, mask/bias broadcasting, skip inactive tiles, and stability fixes by @LoserCheems in #200
- [FEATURE SUPPORT] Triton special compact dynamic-mask attention: 1.6× faster fwd+bwd, numerically equivalent by @LoserCheems in #206
- Fix documentation and references for Flash Sparse Attention by @LoserCheems in #207
Full Changelog: v1.2.2...v1.2.3
v1.2.2
What's Changed
- [FEATURE SUPPORT] Robust dBias accumulation for seqlen_q_bias == 1 by @LoserCheems in #194
- [FEATURE SUPPORT] Centralize dynamic mask creation for FDMA by @LoserCheems in #197
- Update documentation to use mask utility in examples by @LoserCheems in #198
- Fix attention bias calculation and dbias handling by @LoserCheems in #199
- Add block-wise smoothing to attention mask by @LoserCheems in #201
- [FEATURE SUPPORT] Move scaling out of streaming loops, bias-initialized acc_s, and fix dQ double-scaling by @LoserCheems in #203
Full Changelog: v1.2.1...v1.2.2
v1.2.1
What's Changed
- Implement variable-length attention with mask and bias support by @LoserCheems in #185
- Add issue/PR templates by @LoserCheems in #186
- [FEATURE SUPPORT] Variable-Length Attention with Padding-Free Execution by @LoserCheems in #188
- [FEATURE SUPPORT] Broadcastable 4D mask/bias, 128‑rounded key length, stride‑0 broadcasting, and dbias reductions by @LoserCheems in #190
- Refactor bias initialization and enhance bias computation in FlashDMAttnFunc by @LoserCheems in #191
- Fix attention_mask and attention_bias shape descriptions and remove redundant checks by @LoserCheems in #192
- Enhance bias gradient accumulation in backward pass by @LoserCheems in #193
Full Changelog: v1.2.0...v1.2.1
v1.2.0
What's Changed
- [BUG FIX] Fix mask/bias memory access and vectorization issues in kernels by @LoserCheems in #182
Full Changelog: v1.1.9...v1.2.0
v1.1.9
What's Changed
- Refactor attention mask and bias handling for efficiency by @LoserCheems in #177
- [BUG FIX] SM80 NaN in bias.grad when both mask and bias are enabled by @LoserCheems in #179
Full Changelog: v1.1.8...v1.1.9
v1.1.8
v1.1.7
What's Changed
- Increase GitHub Actions build timeout to 6 hours by @LoserCheems in #175
Full Changelog: v1.1.6...v1.1.7
v1.1.6
What's Changed
- Remove CUDA architecture '120' for compatibility by @LoserCheems in #174
Full Changelog: v1.1.5...v1.1.6
v1.1.5
What's Changed
- Expand build matrix for ARM64 and additional CUDA architectures by @LoserCheems in #173
Full Changelog: v1.1.4...v1.1.5
v1.1.4
What's Changed
- Refine build matrix and CUDA architecture specifications by @LoserCheems in #172
Full Changelog: v1.1.1...v1.1.4