v0.2.0
What's Changed
- Remove unused CUDA generator includes for improved build performance by @LoserCheems in #105
- [WIP] Support Backward for Dynamic Mask Attention by @LoserCheems in #106
- Fix CUDA forward crash when seqlen_q == 1 by @LoserCheems in #108
- Add backward pass support for FlashDynamicMaskAttention by @LoserCheems in #109
- Fix varlen mask and bias tensor shapes for all varlen attention functions by @Copilot in #114
- Refactor backward pass and optimize kernel configurations by @LoserCheems in #116
- Integrate Flash Dynamic Mask Attention (FDMA) Into Transformers-Style Attention Flow by @LoserCheems in #118
- Fixes attention mask/bias shape documentation by @LoserCheems in #123
- Improve CUDA build configuration and fix gradient computation in attention by @LoserCheems in #124
- Enhance backward pass support and optimization for CUDA architectures by @LoserCheems in #125
- Bumps version to 0.2.0 by @LoserCheems in #126
Full Changelog: v0.1.0...v0.2.0