Skip to content

custom 4d attention masks broken by #28937 #29525

@poedator

Description

@poedator

System Info

The 4.38.2 version breaks code using custom 4d attention masks (introduced in #27539). Apparently, the custom masks gets replaced here:

causal_mask = attention_mask
if attention_mask is not None and cache_position is not None:
causal_mask = causal_mask[:, :, cache_position, : key_states.shape[-2]]

The issue was introduced with #28937. It is unclear whether the relevant slow tests for 4d masks were run then, but they fail now:

RUN_SLOW=1 python -m pytest -v ./tests/test_modeling_utils.py::Mask4DTestFP32
FAILED tests/test_modeling_utils.py::Mask4DTestFP32::test_attention - AttributeError: 'NoneType' object has no attribute 'shape'
FAILED tests/test_modeling_utils.py::Mask4DTestFP32::test_causal_model_logits - AssertionError: Tensor-likes are not close!
FAILED tests/test_modeling_utils.py::Mask4DTestFP32::test_inner_model - AssertionError: Tensor-likes are not close!

RUN_SLOW=1 python -m pytest -v ./tests/test_modeling_utils.py::Mask4DTestFP16
FAILED tests/test_modeling_utils.py::Mask4DTestFP16::test_attention - AttributeError: 'NoneType' object has no attribute 'shape'
FAILED tests/test_modeling_utils.py::Mask4DTestFP16::test_causal_model_logits - AssertionError: Tensor-likes are not close!

please fix or suggest workaround

summoning @ArthurZucker
cc @gante @younesbelkada

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions