v1.2.1

LoserCheems released this 16 Oct 04:51

· 111 commits to main since this release

df0971d

What's Changed

Implement variable-length attention with mask and bias support by @LoserCheems in #185
Add issue/PR templates by @LoserCheems in #186
[FEATURE SUPPORT] Variable-Length Attention with Padding-Free Execution by @LoserCheems in #188
[FEATURE SUPPORT] Broadcastable 4D mask/bias, 128‑rounded key length, stride‑0 broadcasting, and dbias reductions by @LoserCheems in #190
Refactor bias initialization and enhance bias computation in FlashDMAttnFunc by @LoserCheems in #191
Fix attention_mask and attention_bias shape descriptions and remove redundant checks by @LoserCheems in #192
Enhance bias gradient accumulation in backward pass by @LoserCheems in #193

Full Changelog: v1.2.0...v1.2.1

Contributors

LoserCheems

Assets 87