[Bug]: FlashMLA: invalid configuration argument

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

On B200

`vllm serve deepseek-ai/DeepSeek-V3.2-Exp -dp 8  --enable-expert-parallel --port 9256`

`vllm bench serve --model deepseek-ai/DeepSeek-V3.2-Exp  --dataset-name random --host 127.0.0.1 --port 9256 --random-input-len 256 --random-output-len 256 --request-rate inf --num-prompts 1024`

Will meet error

```bash
CUDA error (/home/wentao/vllm-source/cmake-build-release/_deps/flashmla-src/csrc/smxx/mla_combine.cu:201): invalid configuration argument
CUDA error (/home/wentao/vllm-source/cmake-build-release/_deps/flashmla-src/csrc/smxx/mla_combine.cu:201): invalid configuration argument
CUDA error (/home/wentao/vllm-source/cmake-build-release/_deps/flashmla-src/csrc/smxx/mla_combine.cu:201): invalid configuration argument
CUDA error (/home/wentao/vllm-source/cmake-build-release/_deps/flashmla-src/csrc/smxx/mla_combine.cu:201): invalid configuration argument
CUDA error (/home/wentao/vllm-source/cmake-build-release/_deps/flashmla-src/csrc/smxx/mla_combine.cu:201): invalid configuration argument
CUDA error (/home/wentao/vllm-source/cmake-build-release/_deps/flashmla-src/csrc/smxx/mla_combine.cu:201): invalid configuration argument
CUDA error (/home/wentao/vllm-source/cmake-build-release/_deps/flashmla-src/csrc/smxx/mla_combine.cu:201): invalid configuration argument
[1;36m(APIServer pid=767475)[0;0m ERROR 10-16 10:21:18 [core_client.py:597] Engine core proc EngineCore_DP6 died unexpectedly, shutting down client.
[1;36m(APIServer pid=767475)[0;0m ERROR 10-16 10:21:21 [async_llm.py:524] AsyncLLM output_handler failed.
[1;36m(APIServer pid=767475)[0;0m ERROR 10-16 10:21:21 [async_llm.py:524] Traceback (most recent call last):
[1;36m(APIServer pid=767475)[0;0m ERROR 10-16 10:21:21 [async_llm.py:524]   File "/home/wentao/vllm-source/vllm/v1/engine/async_llm.py", line 478, in output_handler
[1;36m(APIServer pid=767475)[0;0m ERROR 10-16 10:21:21 [async_llm.py:524]     outputs = await engine_core.get_output_async()
[1;36m(APIServer pid=767475)[0;0m ERROR 10-16 10:21:21 [async_llm.py:524]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(APIServer pid=767475)[0;0m ERROR 10-16 10:21:21 [async_llm.py:524]   File "/home/wentao/vllm-source/vllm/v1/engine/core_client.py", line 882, in get_output_async
[1;36m(APIServer pid=767475)[0;0m ERROR 10-16 10:21:21 [async_llm.py:524]     raise self._format_exception(outputs) from None
[1;36m(APIServer pid=767475)[0;0m ERROR 10-16 10:21:21 [async_llm.py:524] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: FlashMLA: invalid configuration argument #27043

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: FlashMLA: invalid configuration argument #27043

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions