Open
Description
Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
🐛 Describe the bug
We hit an exception on running llama4 models with latest code on ROCm V1:
(VllmWorker rank=2 pid=267) ERROR 06-19 01:00:39 [multiproc_executor.py:488] TypeError: AiterFlashAttentionImpl.__init__() got multiple values for argument 'use_irope'
Current work-around:
To turn off AITER_MHA, with VLLM_ROCM_USE_AITER_MHA=0
Proposal:
- Fix the bug (the team is working on it)
- Add a end-to-end test for one of the small llama4 models
- [ ]
The motivation for adding an end to end test for a small version of llama4 models, is that we have seen issues of breaking llama4 models in the past because of lacking such tests.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.