v0.2.0

LoserCheems released this 25 Aug 14:23

· 443 commits to main since this release

f89d9f6

What's Changed

Remove unused CUDA generator includes for improved build performance by @LoserCheems in #105
[WIP] Support Backward for Dynamic Mask Attention by @LoserCheems in #106
Fix CUDA forward crash when seqlen_q == 1 by @LoserCheems in #108
Add backward pass support for FlashDynamicMaskAttention by @LoserCheems in #109
Fix varlen mask and bias tensor shapes for all varlen attention functions by @Copilot in #114
Refactor backward pass and optimize kernel configurations by @LoserCheems in #116
Integrate Flash Dynamic Mask Attention (FDMA) Into Transformers-Style Attention Flow by @LoserCheems in #118
Fixes attention mask/bias shape documentation by @LoserCheems in #123
Improve CUDA build configuration and fix gradient computation in attention by @LoserCheems in #124
Enhance backward pass support and optimization for CUDA architectures by @LoserCheems in #125
Bumps version to 0.2.0 by @LoserCheems in #126

Full Changelog: v0.1.0...v0.2.0

Contributors

LoserCheems

Assets 2

0 Join discussion