[Bug] InternVL3-8B-AWQ is much slower than InternVL3-8B

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

### Describe the bug

I'm using InternVL3-8B-AWQ to inference on vLLM, which is much slower than InternVL3-8B.

The decive I'm using:
RTX4090D 24G
vLLM==0.9.0

Time cost for return of first token:
InternVL3-8B 0.81s
InternVL3-8B-AWQ 1.40s

Command Settings:
python3 -m vllm.entrypoints.openai.api_server
--model models--OpenGVLab--InternVL3-8B-AWQ
--gpu-memory-utilization 0.9
--max_num_seqs 1
--max-model-len 16384
--served-model-name "vlm_test"
--limit-mm-per-prompt image=5
--quantization awq
--trust-remote-code

Wonder if there is any special settings needed?

### Reproduction

python3 -m vllm.entrypoints.openai.api_server
--model models--OpenGVLab--InternVL3-8B-AWQ
--gpu-memory-utilization 0.9
--max_num_seqs 1
--max-model-len 16384
--served-model-name "vlm_test"
--limit-mm-per-prompt image=5
--quantization awq
--trust-remote-code

### Environment

```Shell
RTX4090D 24G
vLLM==0.9.0
```

### Error traceback

```Shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] InternVL3-8B-AWQ is much slower than InternVL3-8B #1057

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] InternVL3-8B-AWQ is much slower than InternVL3-8B #1057

Description

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions