Skip to content

[Bug]: AiterFlashAttentionImpl.__init__() got multiple values for argument 'use_irope' for llama4 model #19867

Open
@hongxiayang

Description

@hongxiayang

Your current environment

The output of python collect_env.py
Your output of `python collect_env.py` here

🐛 Describe the bug

We hit an exception on running llama4 models with latest code on ROCm V1:

(VllmWorker rank=2 pid=267) ERROR 06-19 01:00:39 [multiproc_executor.py:488] TypeError: AiterFlashAttentionImpl.__init__() got multiple values for argument 'use_irope'

Current work-around:
To turn off AITER_MHA, with VLLM_ROCM_USE_AITER_MHA=0

Proposal:

  • Fix the bug (the team is working on it)
  • Add a end-to-end test for one of the small llama4 models
  • [ ]

The motivation for adding an end to end test for a small version of llama4 models, is that we have seen issues of breaking llama4 models in the past because of lacking such tests.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingrocmRelated to AMD ROCm

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions