Skip to content

v0.2.0

Choose a tag to compare

@LoserCheems LoserCheems released this 25 Aug 14:23
· 443 commits to main since this release
f89d9f6

What's Changed

  • Remove unused CUDA generator includes for improved build performance by @LoserCheems in #105
  • [WIP] Support Backward for Dynamic Mask Attention by @LoserCheems in #106
  • Fix CUDA forward crash when seqlen_q == 1 by @LoserCheems in #108
  • Add backward pass support for FlashDynamicMaskAttention by @LoserCheems in #109
  • Fix varlen mask and bias tensor shapes for all varlen attention functions by @Copilot in #114
  • Refactor backward pass and optimize kernel configurations by @LoserCheems in #116
  • Integrate Flash Dynamic Mask Attention (FDMA) Into Transformers-Style Attention Flow by @LoserCheems in #118
  • Fixes attention mask/bias shape documentation by @LoserCheems in #123
  • Improve CUDA build configuration and fix gradient computation in attention by @LoserCheems in #124
  • Enhance backward pass support and optimization for CUDA architectures by @LoserCheems in #125
  • Bumps version to 0.2.0 by @LoserCheems in #126

Full Changelog: v0.1.0...v0.2.0