[model] support Qwen3.5 series models by hjh0119 · Pull Request #76 · modelscope/twinkle

hjh0119 · 2026-02-26T09:17:05Z

No description provided.

gemini-code-assist · 2026-02-26T09:17:37Z

Summary of Changes

Hello @hjh0119, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the model support within the Twinkle framework by integrating the Qwen3.5 series of large language models. The changes involve a substantial overhaul of the model loading and configuration system, introducing a more modular MegatronModelLoader to accommodate the diverse architectural requirements of these new models, particularly their mixed attention types and multimodal features. This enhancement enables users to leverage Qwen3.5 models for various tasks, including those requiring multimodal processing, with improved flexibility and configurability.

Highlights

Qwen3.5 Series Model Support: Introduced comprehensive support for Qwen3.5 series models, including both standard and Mixture-of-Experts (MoE) variants, as well as multimodal capabilities.
Modular Model Loading Refactor: Refactored the Megatron model registration and loading mechanism to use a MegatronModelLoader class, allowing for more flexible and model-specific configuration and layer specification.
Qwen3.5 Specific Implementations: Added new Megatron model types and corresponding bridge and loader implementations for Qwen3.5 and Qwen3.5-MoE, handling their unique attention mechanisms (linear/full attention) and RMSNorm.
Configuration and Argument Updates: Updated model argument parsing and configuration conversion to correctly interpret and apply Qwen3.5-specific parameters, such as moe_router_enable_expert_bias and heterogeneous layer types.
Example Training Script: Provided an example training script demonstrating how to use the new Qwen3.5-35B-A3B model with LoRA in the Twinkle framework.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

cookbook/megatron/tp_moe_qwen35.py
- Added a new example script for training Qwen3.5-35B-A3B with LoRA and Megatron.
src/twinkle/model/megatron/args.py
- Added moe_router_enable_expert_bias argument.
- Removed llm_model_type argument.
- Introduced moe_shared_expert_intermediate_size property.
- Updated multimodal model detection to use MegatronModelMeta.
- Added new configuration parameters for rotary embeddings.
src/twinkle/model/megatron/model/init.py
- Imported MegatronModelLoader.
src/twinkle/model/megatron/model/constant.py
- Added qwen3_5 and qwen3_5_moe to ModelType and MegatronModelType enums.
src/twinkle/model/megatron/model/gpt_bridge.py
- Introduced _HF_GROUPED_FALSE_TYPES and _get_transpose methods for handling HuggingFace model specificities.
- Modified _set_mlp_state to use the new transpose logic and args.hf_model_type.
- Updated the condition for _convert_mtp_layer to include qwen3_5.
src/twinkle/model/megatron/model/gpt_model.py
- Adjusted _preprocess to correctly handle mrope_position_ids for rotary embeddings.
src/twinkle/model/megatron/model/gpts/qwen3_next.py
- Added a new file implementing Qwen3NextRMSNorm, Qwen3NextSelfAttention, Qwen3NextGatedDeltaNet, Qwen3_5MoeGatedDeltaNet, and Qwen3NextLoader to support Qwen3-Next and Qwen3.5 models.
src/twinkle/model/megatron/model/mm_gpt_model.py
- Updated reduce_scatter_to_sequence_parallel_region to use mpu.get_tensor_model_parallel_world_size().
src/twinkle/model/megatron/model/mm_gpts/init.py
- Imported the new qwen3_5 module.
src/twinkle/model/megatron/model/mm_gpts/qwen3_5.py
- Added a new file defining Qwen3_5Vit (vision module), Qwen3_5Bridge (multimodal bridge), and Qwen3_5MoeLoader, and registered them for Qwen3.5 multimodal models.
src/twinkle/model/megatron/model/register.py
- Refactored MegatronModelMeta to include a loader attribute and introduced a MegatronModelLoader base class for modular model configuration.
- Removed unused ArgumentParser import.
src/twinkle/model/megatron/utils/config.py
- Expanded configuration mapping and conversion logic to support Qwen3.5 model specifics.
- Updated convert_hf_config to handle qwen3_5 and qwen3_5_moe model types, including their qk_layernorm and use_shared_expert_gate settings.
src/twinkle/processor/base.py
- Modified collate_fn to prevent unnecessary processing for single inputs when using the 'megatron' framework.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the Qwen3.5 series of models. The changes include a significant and well-designed refactoring of the model creation process, moving towards a more extensible loader pattern. It adds model-specific components for Qwen3.5, such as a custom RMSNorm and attention implementation to handle its heterogeneous architecture. Overall, the changes are a solid improvement. I have identified one critical bug related to an incorrect super() call that would cause runtime errors, and a minor issue regarding unused imports in a new example script. After addressing these points, the PR should be in good shape.

gemini-code-assist · 2026-02-26T09:19:02Z

src/twinkle/model/megatron/model/gpts/qwen3_next.py

+    """
+
+    def __init__(self, config, submodules: SelfAttentionSubmodules, *args, **kwargs):
+        super(SelfAttention, self).__init__(config, submodules, *args, attention_type='self', **kwargs)


The super() call is incorrect. super(SelfAttention, self) calls the __init__ of SelfAttention's parent class, skipping the initialization logic within SelfAttention itself. This will lead to AttributeErrors at runtime because attributes expected to be set by SelfAttention.__init__ will be missing.

To fix this, you should use super().__init__(...) which correctly calls the parent class (SelfAttention) constructor.

Suggested change

super(SelfAttention, self).__init__(config, submodules, *args, attention_type='self', **kwargs)

super().__init__(config, submodules, *args, attention_type='self', **kwargs)

cookbook/megatron/qwen3_5.py

hjh0119 added 6 commits February 25, 2026 10:28

wip

36a40df

Merge branch 'main' into qwen35

ce29294

fix

d2ea704

fix

79302dd

fix

da232f7

merge main

d4cb9d3

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

hjh0119 added 5 commits February 26, 2026 17:29

rename script

a66fb64

revert

e4d2bac

align

d0532a8

clean

a2f270d

lint

252b42a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] support Qwen3.5 series models#76

[model] support Qwen3.5 series models#76
hjh0119 wants to merge 11 commits intomodelscope:mainfrom
hjh0119:qwen35

hjh0119 commented Feb 26, 2026

Uh oh!

gemini-code-assist bot commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	super(SelfAttention, self).__init__(config, submodules, args, attention_type='self', *kwargs)
	super().__init__(config, submodules, args, attention_type='self', *kwargs)

Conversation

hjh0119 commented Feb 26, 2026

Uh oh!

gemini-code-assist bot commented Feb 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant