Skip to content

Conversation

@FightingZhen
Copy link
Contributor

@FightingZhen FightingZhen commented May 26, 2025

What does this PR do?

1. Problem 1
Flex-attention has not been fully verified on Ascend NPU yet. In PR #37866 , the func is_torch_flex_attn_available defined in below code does not contain logical judgment about Ascend NPU. In this situation, when we use torch>=2.5.0, this func will return True on Ascend NPU, which is not correct.

def is_torch_flex_attn_available():

2. Problem 2
If func is_torch_flex_attn_available return False on Ascend NPU as expected, object BlockMask will not be imported in below code

from torch.nn.attention.flex_attention import BlockMask, create_block_mask

But this object is directly called in type annotations in below code now, directly causing importance error.

attention_mask: Optional[Union[torch.Tensor, BlockMask]],

Therefore, this PR is committed for solving above two problems. By adding flex-attention not supported on Ascend NPU logic in is_torch_flex_attn_available, and updating BlockMask type annotations to string format.

Fixes # (issue)
#38362

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@FightingZhen FightingZhen marked this pull request as draft May 26, 2025 09:13
@FightingZhen FightingZhen deleted the bugfix_flex_attn branch August 14, 2025 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant