[merger] fix: support vision_model keys in Megatron merger for VL models #4701

yurekami · 2025-12-28T08:40:56Z

Summary

Add vision_model to the skip_checking_keys list in _check_megatron_state_key() method
Update docstrings and error messages to reflect the new supported key prefix
Apply same fix to both megatron_model_merger.py and legacy_model_merger.py

This allows merging Vision-Language model checkpoints (like Qwen2.5-VL-7B) back to HuggingFace format. Previously, keys like vision_model.patch_embed.proj.weight would fail validation because the merger expected all keys to start with 'decoder'.

Fixes #4498

Test plan

Test merging a VL model checkpoint (e.g., Qwen2.5-VL-7B trained with Megatron) back to HuggingFace format

🤖 Generated with Claude Code

Add vision_model to the skip_checking_keys list to allow merging Vision-Language model checkpoints. Previously, keys like vision_model.patch_embed.proj.weight would fail validation because they don't start with 'decoder'. Fixes volcengine#4498 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request adds support for vision_model keys in the Megatron model merger, which is necessary for handling Vision-Language models. The changes are applied consistently to both legacy_model_merger.py and megatron_model_merger.py. The core logic of adding vision_model to the skip_checking_keys list is correct.

My main feedback is on improving the clarity of the docstrings and error messages within the _check_megatron_state_key function in both files. The current descriptions are a bit misleading about the key validation logic, and I've provided suggestions to make them more accurate and maintainable.

gemini-code-assist · 2025-12-28T08:46:19Z

scripts/legacy_model_merger.py

+        Now the model merger only supports keys that start with "decoder/embedding/output_layer/vision_model" in TransformerLayer.
        Shall not use key starts with "model."
        """
        if key.startswith("model."):
            raise ValueError(
-                f"Invalid key {key} in Megatron state_dict. Expected keys to start with 'decoder/embedding/output_layer' in TransformerLayer."
+                f"Invalid key {key} in Megatron state_dict. Expected keys to start with 'decoder/embedding/output_layer/vision_model' in TransformerLayer."
            )


The updated docstring and error message are a bit misleading. They state that keys are expected to start with a specific set of prefixes, but the logic is more nuanced. The code actually checks if keys contain certain substrings (like vision_model) to skip them, or if they start with decoder. This discrepancy can be confusing for future maintenance.

To improve clarity and prevent potential bugs, I suggest updating the docstring and error message to more accurately reflect the validation logic.

Suggested change

Now the model merger only supports keys that start with "decoder/embedding/output_layer/vision_model" in TransformerLayer.

Shall not use key starts with "model."

"""

if key.startswith("model."):

raise ValueError(

f"Invalid key {key} in Megatron state_dict. Expected keys to start with 'decoder/embedding/output_layer' in TransformerLayer."

f"Invalid key {key} in Megatron state_dict. Expected keys to start with 'decoder/embedding/output_layer/vision_model' in TransformerLayer."

)

Now the model merger supports keys for decoder, embedding, output_layer, and vision_model components.

Keys starting with "model." are disallowed.

"""

if key.startswith("model."):

raise ValueError(

f"Invalid key {key} in Megatron state_dict. Keys starting with 'model.' are not allowed. "

f"Expected keys for 'decoder', 'embedding', 'output_layer', or 'vision_model'."

)

gemini-code-assist · 2025-12-28T08:46:19Z

verl/model_merger/megatron_model_merger.py

+        Now the model merger only supports keys that start with "decoder/embedding/output_layer/vision_model" in TransformerLayer.
        Shall not use key starts with "model."
        """
        if key.startswith("model."):
            raise ValueError(
                f"Invalid key {key} in Megatron state_dict. Expected keys to start with "
-                f"'decoder/embedding/output_layer' in TransformerLayer."
+                f"'decoder/embedding/output_layer/vision_model' in TransformerLayer."
            )


The updated docstring and error message are a bit misleading. They state that keys are expected to start with a specific set of prefixes, but the logic is more nuanced. The code actually checks if keys contain certain substrings (like vision_model) to skip them, or if they start with decoder. This discrepancy can be confusing for future maintenance.

To improve clarity and prevent potential bugs, I suggest updating the docstring and error message to more accurately reflect the validation logic.

Suggested change

Now the model merger only supports keys that start with "decoder/embedding/output_layer/vision_model" in TransformerLayer.

Shall not use key starts with "model."

"""

if key.startswith("model."):

raise ValueError(

f"Invalid key {key} in Megatron state_dict. Expected keys to start with "

f"'decoder/embedding/output_layer' in TransformerLayer."

f"'decoder/embedding/output_layer/vision_model' in TransformerLayer."

)

Now the model merger only supports keys that start with "decoder/embedding/output_layer/vision_model" in TransformerLayer.

Shall not use key starts with "model."

"""

if key.startswith("model."):

raise ValueError(

f"Invalid key {key} in Megatron state_dict. Keys starting with 'model.' are not allowed. "

f"Expected keys for 'decoder', 'embedding', 'output_layer', or 'vision_model'."

)

gemini-code-assist bot reviewed Dec 28, 2025

View reviewed changes

wuxibin89 requested a review from ISEEKYAN January 5, 2026 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[merger] fix: support vision_model keys in Megatron merger for VL models #4701

[merger] fix: support vision_model keys in Megatron merger for VL models #4701

yurekami commented Dec 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 28, 2025

Uh oh!

gemini-code-assist bot Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[merger] fix: support vision_model keys in Megatron merger for VL models #4701

Are you sure you want to change the base?

[merger] fix: support vision_model keys in Megatron merger for VL models #4701

Conversation

yurekami commented Dec 28, 2025

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant