Skip to content

[model] support Qwen3.5 series models#76

Open
hjh0119 wants to merge 11 commits intomodelscope:mainfrom
hjh0119:qwen35
Open

[model] support Qwen3.5 series models#76
hjh0119 wants to merge 11 commits intomodelscope:mainfrom
hjh0119:qwen35

Conversation

@hjh0119
Copy link
Collaborator

@hjh0119 hjh0119 commented Feb 26, 2026

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hjh0119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the model support within the Twinkle framework by integrating the Qwen3.5 series of large language models. The changes involve a substantial overhaul of the model loading and configuration system, introducing a more modular MegatronModelLoader to accommodate the diverse architectural requirements of these new models, particularly their mixed attention types and multimodal features. This enhancement enables users to leverage Qwen3.5 models for various tasks, including those requiring multimodal processing, with improved flexibility and configurability.

Highlights

  • Qwen3.5 Series Model Support: Introduced comprehensive support for Qwen3.5 series models, including both standard and Mixture-of-Experts (MoE) variants, as well as multimodal capabilities.
  • Modular Model Loading Refactor: Refactored the Megatron model registration and loading mechanism to use a MegatronModelLoader class, allowing for more flexible and model-specific configuration and layer specification.
  • Qwen3.5 Specific Implementations: Added new Megatron model types and corresponding bridge and loader implementations for Qwen3.5 and Qwen3.5-MoE, handling their unique attention mechanisms (linear/full attention) and RMSNorm.
  • Configuration and Argument Updates: Updated model argument parsing and configuration conversion to correctly interpret and apply Qwen3.5-specific parameters, such as moe_router_enable_expert_bias and heterogeneous layer types.
  • Example Training Script: Provided an example training script demonstrating how to use the new Qwen3.5-35B-A3B model with LoRA in the Twinkle framework.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • cookbook/megatron/tp_moe_qwen35.py
    • Added a new example script for training Qwen3.5-35B-A3B with LoRA and Megatron.
  • src/twinkle/model/megatron/args.py
    • Added moe_router_enable_expert_bias argument.
    • Removed llm_model_type argument.
    • Introduced moe_shared_expert_intermediate_size property.
    • Updated multimodal model detection to use MegatronModelMeta.
    • Added new configuration parameters for rotary embeddings.
  • src/twinkle/model/megatron/model/init.py
    • Imported MegatronModelLoader.
  • src/twinkle/model/megatron/model/constant.py
    • Added qwen3_5 and qwen3_5_moe to ModelType and MegatronModelType enums.
  • src/twinkle/model/megatron/model/gpt_bridge.py
    • Introduced _HF_GROUPED_FALSE_TYPES and _get_transpose methods for handling HuggingFace model specificities.
    • Modified _set_mlp_state to use the new transpose logic and args.hf_model_type.
    • Updated the condition for _convert_mtp_layer to include qwen3_5.
  • src/twinkle/model/megatron/model/gpt_model.py
    • Adjusted _preprocess to correctly handle mrope_position_ids for rotary embeddings.
  • src/twinkle/model/megatron/model/gpts/qwen3_next.py
    • Added a new file implementing Qwen3NextRMSNorm, Qwen3NextSelfAttention, Qwen3NextGatedDeltaNet, Qwen3_5MoeGatedDeltaNet, and Qwen3NextLoader to support Qwen3-Next and Qwen3.5 models.
  • src/twinkle/model/megatron/model/mm_gpt_model.py
    • Updated reduce_scatter_to_sequence_parallel_region to use mpu.get_tensor_model_parallel_world_size().
  • src/twinkle/model/megatron/model/mm_gpts/init.py
    • Imported the new qwen3_5 module.
  • src/twinkle/model/megatron/model/mm_gpts/qwen3_5.py
    • Added a new file defining Qwen3_5Vit (vision module), Qwen3_5Bridge (multimodal bridge), and Qwen3_5MoeLoader, and registered them for Qwen3.5 multimodal models.
  • src/twinkle/model/megatron/model/register.py
    • Refactored MegatronModelMeta to include a loader attribute and introduced a MegatronModelLoader base class for modular model configuration.
    • Removed unused ArgumentParser import.
  • src/twinkle/model/megatron/utils/config.py
    • Expanded configuration mapping and conversion logic to support Qwen3.5 model specifics.
    • Updated convert_hf_config to handle qwen3_5 and qwen3_5_moe model types, including their qk_layernorm and use_shared_expert_gate settings.
  • src/twinkle/processor/base.py
    • Modified collate_fn to prevent unnecessary processing for single inputs when using the 'megatron' framework.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Qwen3.5 series of models. The changes include a significant and well-designed refactoring of the model creation process, moving towards a more extensible loader pattern. It adds model-specific components for Qwen3.5, such as a custom RMSNorm and attention implementation to handle its heterogeneous architecture. Overall, the changes are a solid improvement. I have identified one critical bug related to an incorrect super() call that would cause runtime errors, and a minor issue regarding unused imports in a new example script. After addressing these points, the PR should be in good shape.

"""

def __init__(self, config, submodules: SelfAttentionSubmodules, *args, **kwargs):
super(SelfAttention, self).__init__(config, submodules, *args, attention_type='self', **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The super() call is incorrect. super(SelfAttention, self) calls the __init__ of SelfAttention's parent class, skipping the initialization logic within SelfAttention itself. This will lead to AttributeErrors at runtime because attributes expected to be set by SelfAttention.__init__ will be missing.

To fix this, you should use super().__init__(...) which correctly calls the parent class (SelfAttention) constructor.

Suggested change
super(SelfAttention, self).__init__(config, submodules, *args, attention_type='self', **kwargs)
super().__init__(config, submodules, *args, attention_type='self', **kwargs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant