Enable `RMSNorm` substitution for Transformers backend #26353

hmellor · 2025-10-07T13:34:02Z

This change should enable quant fusions which depend on the RMSNorm op being present

This change should enable quant fusions which depend on the `RMSNorm` op being present Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

gemini-code-assist

Code Review

I've reviewed your changes to enable RMSNorm substitution. The approach to differentiate between RMSNorm and GemmaRMSNorm is clever, but I've identified a critical issue with handling weightless norms and a potential robustness improvement. Please see my detailed comments below.

vllm/model_executor/models/transformers.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

vllm/model_executor/models/transformers.py

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Isotr0py · 2025-10-07T16:37:00Z

vllm/model_executor/models/transformers.py

+    if weight_test is not None and torch.all(weight_test == 0):
+        return GemmaRMSNorm(**kwargs)


Can we simply check the existence of _norm function for GemmaRMSNorm?
https://github.com/huggingface/transformers/blob/50090c3fc82e1e0a06b4da366ea2fb6055d529e9/src/transformers/models/gemma3n/modeling_gemma3n.py#L123-L124

That would work for Gemma models as they are currently implemented in Transformers. However:

Custom models may not use this pattern

_norm is a private method and so may change under us

A counter-example would be Moshi, which implements _norm but does x * w instead of x * (1 + w)

Isotr0py

LGTM

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

…to loader * 'loader' of https://github.com/dsxsteven/vllm_splitPR: (778 commits) [torchao] Add support for ModuleFqnToConfig using regex (vllm-project#26001) Add: Support for multiple hidden layers in Eagle3 (vllm-project#26164) Enable `RMSNorm` substitution for Transformers backend (vllm-project#26353) [Model] Gemma3: Fix GGUF loading and quantization (vllm-project#26189) Bump Flashinfer to v0.4.0 (vllm-project#26326) Update Dockerfile and install runai-model-streamer[gcs] package (vllm-project#26464) [Core] Relax the LoRA max rank (vllm-project#26461) [CI/Build] Fix model nightly tests (vllm-project#26466) [Hybrid]: Decouple Kernel Block Size from KV Page Size (vllm-project#24486) [Core][KVConnector] Propagate all tokens on resumed preemptions (vllm-project#24926) [MM][Doc] Add documentation for configurable mm profiling (vllm-project#26200) [Hardware][AMD] Enable FlexAttention backend on ROCm (vllm-project#26439) [Bugfix] Incorrect another MM data format in vllm bench throughput (vllm-project#26462) [Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-project#26445) [Minor] Change warning->warning_once in preprocess (vllm-project#26455) [Bugfix] Set the minimum python version for gpt-oss (vllm-project#26392) [Misc] Redact ray runtime env before logging (vllm-project#26302) Separate MLAAttention class from Attention (vllm-project#25103) [Attention] Register FLASHMLA_SPARSE (vllm-project#26441) [Kernels] Modular kernel refactor (vllm-project#24812) ...

…26353) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…26353) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

…26353) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

…26353) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…26353) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Enable RMSNorm substitution for Transformers backend

dee601f

This change should enable quant fusions which depend on the `RMSNorm` op being present Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

github-project-automation bot added this to Transformers backend Oct 7, 2025

github-project-automation bot moved this to Todo in Transformers backend Oct 7, 2025

hmellor requested a review from Isotr0py October 7, 2025 13:34

gemini-code-assist bot reviewed Oct 7, 2025

View reviewed changes

vllm/model_executor/models/transformers.py Show resolved Hide resolved

vllm/model_executor/models/transformers.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Oct 7, 2025

View reviewed changes

vllm/model_executor/models/transformers.py Outdated Show resolved Hide resolved

Error handling for rms_norm.__class__(1)

423e4eb

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor moved this from Todo to In Progress in Transformers backend Oct 7, 2025

Decrease indentation

8f945a1

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Isotr0py reviewed Oct 7, 2025

View reviewed changes

Isotr0py approved these changes Oct 8, 2025

View reviewed changes

Merge branch 'main' into transformers-backend-rms-norm

9c0e0fc

hmellor enabled auto-merge (squash) October 8, 2025 08:48

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 8, 2025

hmellor added 2 commits October 8, 2025 15:50

Get hidden_size from text config

c8eba50

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Typo...

2568fa5

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor merged commit b960441 into vllm-project:main Oct 9, 2025
53 checks passed

hmellor deleted the transformers-backend-rms-norm branch October 9, 2025 07:28

github-project-automation bot moved this from In Progress to Done in Transformers backend Oct 9, 2025

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

Enable RMSNorm substitution for Transformers backend (vllm-project#…

0fc0f5b

…26353) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

Enable RMSNorm substitution for Transformers backend (vllm-project#…

c87798f

…26353) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable `RMSNorm` substitution for Transformers backend #26353

Enable `RMSNorm` substitution for Transformers backend #26353

Uh oh!

hmellor commented Oct 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Isotr0py Oct 7, 2025

Uh oh!

hmellor Oct 7, 2025

Uh oh!

Isotr0py left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if weight_test is not None and torch.all(weight_test == 0):
		return GemmaRMSNorm(**kwargs)

Uh oh!

Enable RMSNorm substitution for Transformers backend #26353

Enable RMSNorm substitution for Transformers backend #26353

Uh oh!

Conversation

hmellor commented Oct 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Isotr0py Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable `RMSNorm` substitution for Transformers backend #26353

Enable `RMSNorm` substitution for Transformers backend #26353