Fix attention bias calculation and dbias handling #199

LoserCheems · 2025-10-27T08:58:27Z

Correct the order of operations in the attention bias calculation for improved numerical stability and introduce window size handling. Adjust dbias shape and broadcasting logic to ensure proper dimension management during backward operations.

Corrects parenthesis to apply the matrix scaling before transpose when building the attention bias, aligning with the intended formula and improving numerical stability/broadcasting. Passes window size into the attention kernel to enable proper windowed masking and behavior.

Fixes incorrect dbias dimension handling in backward by deriving batch and query length from the bias tensor, not reused vars, ensuring correct allocation. Updates expansion/reduction logic for MQA/GQA and broadcasted dims (batch or seqlen_q == 1) to sum over the right axes, preventing mis-shaped outputs. Removes unused variables for clarity.

Copilot

Pull Request Overview

This PR fixes the attention bias calculation to improve numerical stability by correcting the order of operations, and updates the dbias handling logic during backward operations by introducing dedicated variables for dimension tracking and adding window size parameter support.

Corrects parenthesization in attention bias calculation to ensure multiplication happens before transpose operations
Adds window_size parameter to the attention interface call
Refactors dbias dimension tracking by replacing batch_size_bias and seqlen_q_bias with batch_size_dbias and seqlen_q_dbias

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
examples/modeling/modeling_doge.py	Fixes operator precedence in attention bias calculation and adds window_size parameter to attention call
csrc/flash_dmattn/flash_api.cpp	Removes unused mask/bias dimension variables and introduces dedicated dbias dimension tracking variables for proper backward pass handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

LoserCheems added 2 commits October 27, 2025 16:57

Copilot AI review requested due to automatic review settings October 27, 2025 08:58

Copilot AI reviewed Oct 27, 2025

View reviewed changes

github-actions bot requested review from Evanwu1125, SNHuan, Thanksyy, ftgreat, juliohsu, wubingheng111 and zacliu2023 October 27, 2025 08:58

github-actions bot assigned Evanwu1125, ftgreat, juliohsu, SNHuan, Thanksyy, wubingheng111 and zacliu2023 Oct 27, 2025

LoserCheems merged commit 424b733 into main Oct 27, 2025
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix attention bias calculation and dbias handling #199

Fix attention bias calculation and dbias handling #199

Uh oh!

LoserCheems commented Oct 27, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Fix attention bias calculation and dbias handling #199

Fix attention bias calculation and dbias handling #199

Uh oh!

Conversation

LoserCheems commented Oct 27, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants