Skip to content

Conversation

@Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Oct 23, 2025

Purpose

TODO

  • Add some benchmark results. Not really, the code on main branch is terribly slow, and it should work now.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
@Isotr0py Isotr0py requested a review from sighingnow as a code owner October 23, 2025 14:03
@mergify mergify bot added the qwen Related to Qwen models label Oct 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces Conv3d layers with ReplicatedLinear layers in several model files (glm4_1v.py, qwen2_5_vl.py, qwen2_vl.py, qwen3_omni_moe_thinker.py, qwen3_vl.py) and introduces a utility function conv3d_to_linear_weight in vision.py to handle weight conversion. The change aims to provide a temporary solution for issue #27406, leveraging the specific case where kernel_size equals stride in the Conv3d layers, allowing for replacement with a linear layer. The review focuses on the correctness of the weight conversion and the impact on model functionality.

Comment on lines +491 to 493
return_bias=False,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

Comment on lines 548 to 550
def forward(self, x: torch.Tensor) -> torch.Tensor:
L, C = x.shape
x = x.view(L, -1, self.temporal_patch_size, self.patch_size, self.patch_size)
x = self.proj(x).view(L, self.hidden_size)
x = self.proj(x)
return x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

Comment on lines 579 to 581
def forward(self, x: torch.Tensor) -> torch.Tensor:
L, C = x.shape
x = x.view(L, -1, self.temporal_patch_size, self.patch_size, self.patch_size)
x = self.proj(x).view(L, self.embed_dim)
x = self.proj(x)
return x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

Comment on lines 142 to +152
def forward(self, x: torch.Tensor) -> torch.Tensor:
L, C = x.shape
x = x.view(L, -1, self.temporal_patch_size, self.patch_size, self.patch_size)
x = self.proj(x).view(L, self.hidden_size)
x = self.proj(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

Comment on lines +149 to 150
x = self.proj(x)
return x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

)

def forward(self, x: torch.Tensor) -> torch.Tensor:
L, C = x.shape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

return llm_pos_ids


def conv3d_to_linear_weight(conv3d_weight: torch.Tensor) -> torch.Tensor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment why we need this?

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zRzRzRzRzRzRzR can you help verify the correctness (w.r.t. pytorch 2.8)?

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also communicated with Qwen team about this - let's get this in first.

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 24, 2025
@zou3519
Copy link
Collaborator

zou3519 commented Oct 24, 2025

Just to check, does this get all of the performance back?

@ywang96
Copy link
Member

ywang96 commented Oct 24, 2025

Just to check, does this get all of the performance back?

@zou3519 Yea it's lucky that kernel_size==stride in this case so we can use linear instead with the reshape hack. I talked to the model vendors and confirmed that they do still use Conv3d in the training code so hopefully this issue can be resolved soon.

@ywang96 ywang96 enabled auto-merge (squash) October 24, 2025 07:22
@ywang96 ywang96 merged commit 42efe60 into vllm-project:main Oct 24, 2025
54 of 55 checks passed
@Isotr0py Isotr0py deleted the conv3d-to-linear branch October 24, 2025 07:48
atalhens pushed a commit to atalhens/vllm that referenced this pull request Oct 24, 2025
…ect#27418)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
@zou3519
Copy link
Collaborator

zou3519 commented Oct 24, 2025

Just to check, does this get all of the performance back?

@zou3519 Yea it's lucky that kernel_size==stride in this case so we can use linear instead with the reshape hack. I talked to the model vendors and confirmed that they do still use Conv3d in the training code so hopefully this issue can be resolved soon.

Sounds good, I'll see what we can do on the PyTorch side (given it is a cudnn issue, the work might be on the Nvidia side)

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025
…ect#27418)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
rohin-garg pushed a commit to rohin-garg/vllm that referenced this pull request Oct 25, 2025
…ect#27418)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…ect#27418)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…ect#27418)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Oct 28, 2025
### What this PR does / why we need it?

vllm-project/vllm@c9461e0

Fix ```spec decode rejection sampler```, caused by
vllm-project/vllm#26060
Fix some ```import```, caused by
vllm-project/vllm#27374
Fix ```scheduler_config.send_delta_data```, caused by
#3719
Fix ```init_with_cudagraph_sizes```, caused by
vllm-project/vllm#26016
Fix ```vl model```of replacing PatchEmbed's conv3d to linear layer,
caused by vllm-project/vllm#27418

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with new added/existing test.


- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@c9461e0

---------

Signed-off-by: Icey <1790571317@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants