Skip to content

Conversation

@yurekami
Copy link
Contributor

Summary

  • Add vision_model to the skip_checking_keys list in _check_megatron_state_key() method
  • Update docstrings and error messages to reflect the new supported key prefix
  • Apply same fix to both megatron_model_merger.py and legacy_model_merger.py

This allows merging Vision-Language model checkpoints (like Qwen2.5-VL-7B) back to HuggingFace format. Previously, keys like vision_model.patch_embed.proj.weight would fail validation because the merger expected all keys to start with 'decoder'.

Fixes #4498

Test plan

  • Test merging a VL model checkpoint (e.g., Qwen2.5-VL-7B trained with Megatron) back to HuggingFace format

🤖 Generated with Claude Code

Add vision_model to the skip_checking_keys list to allow merging
Vision-Language model checkpoints. Previously, keys like
vision_model.patch_embed.proj.weight would fail validation because
they don't start with 'decoder'.

Fixes volcengine#4498

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for vision_model keys in the Megatron model merger, which is necessary for handling Vision-Language models. The changes are applied consistently to both legacy_model_merger.py and megatron_model_merger.py. The core logic of adding vision_model to the skip_checking_keys list is correct.

My main feedback is on improving the clarity of the docstrings and error messages within the _check_megatron_state_key function in both files. The current descriptions are a bit misleading about the key validation logic, and I've provided suggestions to make them more accurate and maintainable.

Comment on lines +589 to 595
Now the model merger only supports keys that start with "decoder/embedding/output_layer/vision_model" in TransformerLayer.
Shall not use key starts with "model."
"""
if key.startswith("model."):
raise ValueError(
f"Invalid key {key} in Megatron state_dict. Expected keys to start with 'decoder/embedding/output_layer' in TransformerLayer."
f"Invalid key {key} in Megatron state_dict. Expected keys to start with 'decoder/embedding/output_layer/vision_model' in TransformerLayer."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The updated docstring and error message are a bit misleading. They state that keys are expected to start with a specific set of prefixes, but the logic is more nuanced. The code actually checks if keys contain certain substrings (like vision_model) to skip them, or if they start with decoder. This discrepancy can be confusing for future maintenance.

To improve clarity and prevent potential bugs, I suggest updating the docstring and error message to more accurately reflect the validation logic.

Suggested change
Now the model merger only supports keys that start with "decoder/embedding/output_layer/vision_model" in TransformerLayer.
Shall not use key starts with "model."
"""
if key.startswith("model."):
raise ValueError(
f"Invalid key {key} in Megatron state_dict. Expected keys to start with 'decoder/embedding/output_layer' in TransformerLayer."
f"Invalid key {key} in Megatron state_dict. Expected keys to start with 'decoder/embedding/output_layer/vision_model' in TransformerLayer."
)
Now the model merger supports keys for decoder, embedding, output_layer, and vision_model components.
Keys starting with "model." are disallowed.
"""
if key.startswith("model."):
raise ValueError(
f"Invalid key {key} in Megatron state_dict. Keys starting with 'model.' are not allowed. "
f"Expected keys for 'decoder', 'embedding', 'output_layer', or 'vision_model'."
)

Comment on lines +292 to 299
Now the model merger only supports keys that start with "decoder/embedding/output_layer/vision_model" in TransformerLayer.
Shall not use key starts with "model."
"""
if key.startswith("model."):
raise ValueError(
f"Invalid key {key} in Megatron state_dict. Expected keys to start with "
f"'decoder/embedding/output_layer' in TransformerLayer."
f"'decoder/embedding/output_layer/vision_model' in TransformerLayer."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The updated docstring and error message are a bit misleading. They state that keys are expected to start with a specific set of prefixes, but the logic is more nuanced. The code actually checks if keys contain certain substrings (like vision_model) to skip them, or if they start with decoder. This discrepancy can be confusing for future maintenance.

To improve clarity and prevent potential bugs, I suggest updating the docstring and error message to more accurately reflect the validation logic.

Suggested change
Now the model merger only supports keys that start with "decoder/embedding/output_layer/vision_model" in TransformerLayer.
Shall not use key starts with "model."
"""
if key.startswith("model."):
raise ValueError(
f"Invalid key {key} in Megatron state_dict. Expected keys to start with "
f"'decoder/embedding/output_layer' in TransformerLayer."
f"'decoder/embedding/output_layer/vision_model' in TransformerLayer."
)
Now the model merger only supports keys that start with "decoder/embedding/output_layer/vision_model" in TransformerLayer.
Shall not use key starts with "model."
"""
if key.startswith("model."):
raise ValueError(
f"Invalid key {key} in Megatron state_dict. Keys starting with 'model.' are not allowed. "
f"Expected keys for 'decoder', 'embedding', 'output_layer', or 'vision_model'."
)

@wuxibin89 wuxibin89 requested a review from ISEEKYAN January 5, 2026 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Megatron merger do not support VL model, such as Qwen2.5-VL-7B

1 participant