Skip to content

Releases: flash-algo/flash-sparse-attention

v1.2.3

09 Nov 15:55
b746952

Choose a tag to compare

What's Changed

  • Add selectable masking strategies for attention by @LoserCheems in #204
  • Refactor attention block smoothing for consistency by @LoserCheems in #205
  • Optimize triton version: GQA, mask/bias broadcasting, skip inactive tiles, and stability fixes by @LoserCheems in #200
  • [FEATURE SUPPORT] Triton special compact dynamic-mask attention: 1.6× faster fwd+bwd, numerically equivalent by @LoserCheems in #206
  • Fix documentation and references for Flash Sparse Attention by @LoserCheems in #207

Full Changelog: v1.2.2...v1.2.3

What's Changed

  • Add selectable masking strategies for attention by @LoserCheems in #204
  • Refactor attention block smoothing for consistency by @LoserCheems in #205
  • Optimize triton version: GQA, mask/bias broadcasting, skip inactive tiles, and stability fixes by @LoserCheems in #200
  • [FEATURE SUPPORT] Triton special compact dynamic-mask attention: 1.6× faster fwd+bwd, numerically equivalent by @LoserCheems in #206
  • Fix documentation and references for Flash Sparse Attention by @LoserCheems in #207

Full Changelog: v1.2.2...v1.2.3

v1.2.2

05 Nov 08:10

Choose a tag to compare

What's Changed

  • [FEATURE SUPPORT] Robust dBias accumulation for seqlen_q_bias == 1 by @LoserCheems in #194
  • [FEATURE SUPPORT] Centralize dynamic mask creation for FDMA by @LoserCheems in #197
  • Update documentation to use mask utility in examples by @LoserCheems in #198
  • Fix attention bias calculation and dbias handling by @LoserCheems in #199
  • Add block-wise smoothing to attention mask by @LoserCheems in #201
  • [FEATURE SUPPORT] Move scaling out of streaming loops, bias-initialized acc_s, and fix dQ double-scaling by @LoserCheems in #203

Full Changelog: v1.2.1...v1.2.2

v1.2.1

16 Oct 04:51

Choose a tag to compare

What's Changed

  • Implement variable-length attention with mask and bias support by @LoserCheems in #185
  • Add issue/PR templates by @LoserCheems in #186
  • [FEATURE SUPPORT] Variable-Length Attention with Padding-Free Execution by @LoserCheems in #188
  • [FEATURE SUPPORT] Broadcastable 4D mask/bias, 128‑rounded key length, stride‑0 broadcasting, and dbias reductions by @LoserCheems in #190
  • Refactor bias initialization and enhance bias computation in FlashDMAttnFunc by @LoserCheems in #191
  • Fix attention_mask and attention_bias shape descriptions and remove redundant checks by @LoserCheems in #192
  • Enhance bias gradient accumulation in backward pass by @LoserCheems in #193

Full Changelog: v1.2.0...v1.2.1

v1.2.0

01 Oct 16:58

Choose a tag to compare

What's Changed

  • [BUG FIX] Fix mask/bias memory access and vectorization issues in kernels by @LoserCheems in #182

Full Changelog: v1.1.9...v1.2.0

v1.1.9

22 Sep 16:19

Choose a tag to compare

What's Changed

  • Refactor attention mask and bias handling for efficiency by @LoserCheems in #177
  • [BUG FIX] SM80 NaN in bias.grad when both mask and bias are enabled by @LoserCheems in #179

Full Changelog: v1.1.8...v1.1.9

v1.1.8

21 Sep 02:05
ad7a3ab

Choose a tag to compare

What's Changed

Full Changelog: v1.1.7...v1.1.8

v1.1.7

20 Sep 18:30
a73c635

Choose a tag to compare

What's Changed

Full Changelog: v1.1.6...v1.1.7

v1.1.6

20 Sep 12:41
b3249c6

Choose a tag to compare

What's Changed

Full Changelog: v1.1.5...v1.1.6

v1.1.5

20 Sep 12:37
5fe4ed4

Choose a tag to compare

What's Changed

  • Expand build matrix for ARM64 and additional CUDA architectures by @LoserCheems in #173

Full Changelog: v1.1.4...v1.1.5

v1.1.4

20 Sep 12:33
3890db2

Choose a tag to compare

What's Changed

  • Refine build matrix and CUDA architecture specifications by @LoserCheems in #172

Full Changelog: v1.1.1...v1.1.4