v1.2.1
What's Changed
- Implement variable-length attention with mask and bias support by @LoserCheems in #185
- Add issue/PR templates by @LoserCheems in #186
- [FEATURE SUPPORT] Variable-Length Attention with Padding-Free Execution by @LoserCheems in #188
- [FEATURE SUPPORT] Broadcastable 4D mask/bias, 128‑rounded key length, stride‑0 broadcasting, and dbias reductions by @LoserCheems in #190
- Refactor bias initialization and enhance bias computation in FlashDMAttnFunc by @LoserCheems in #191
- Fix attention_mask and attention_bias shape descriptions and remove redundant checks by @LoserCheems in #192
- Enhance bias gradient accumulation in backward pass by @LoserCheems in #193
Full Changelog: v1.2.0...v1.2.1