Skip to content

Flash Attention: Setup #821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 77 commits into
base: main
Choose a base branch
from
Open

Flash Attention: Setup #821

wants to merge 77 commits into from

Conversation

louisfd
Copy link
Member

@louisfd louisfd commented Aug 15, 2025

Add cubecl-attention crate for FlashAttention

  • Introduces the cubecl-attention crate with the same hierarchical levels as matmul: batch, global, stage, tile.
  • Tile level has the most differences compared to matmul. Currently, tile attention performs round trips to shared memory for operations within fragments. This will be optimized once we have finer control over MMA fragments.
  • The algorithm currently only supports accelerated 8×8×8 matmul (used with Metal) and is hardcoded for this configuration.

TODOs:

  • Generalize tile sizes.
  • Stage: support partitioning with multiple planes.
  • Global: iterate over multiple stage iterations.
  • Batch: full-problem dispatch.

Despite these limitations, reusing the matmul architecture should make further development straightforward.

Validate your PR with burn.

It is important that you make sure that you don't introduce any bugs in burn.

Instructions

  • Create a new branch or fork of the burn repo
  • Update the main Cargo.toml with this PR hash.
  • Fix any broken tests or compilation errors in burn.
  •  Submit a PR in burn with your fixes and link it here.

@wingertge
Copy link
Contributor

This will be optimized once we have finer control over MMA fragments.

Good news 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants