fix(sparse-attn): correct block_size computation in backward kernel by Chuge0335 · Pull Request #1 · Chuge0335/FastVideo

Chuge0335 · 2025-12-09T09:49:35Z

fix(sparse-attn): correct block_size masking in backward kernel

The backward kernel used an incorrect block_size when applying the mask, causing padding positions to be treated as valid tokens. As a result, p = tl.math.exp2(qk - m) was computed on invalid entries, leading to Inf and NaN values during gradient accumulation (especially in dQ).

This patch fixes the block_size computation for split blocks so that the mask correctly excludes padded regions in all cases.

fix(sparse-attn): correct block_size masking in backward kernel The backward kernel used an incorrect block_size when applying the mask, causing padding positions to be treated as valid tokens. As a result, p = tl.math.exp2(qk - m) was computed on invalid entries, leading to Inf and NaN values during gradient accumulation (especially in dQ). This patch fixes the block_size computation for split blocks so that the mask correctly excludes padded regions in all cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix(sparse-attn): correct block_size computation in backward kernel#1

fix(sparse-attn): correct block_size computation in backward kernel#1
Chuge0335 wants to merge 1 commit intomainfrom
fix-triton-backward

Chuge0335 commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

Chuge0335 commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant