Intern s2 preview lite awq fix bug by 43758726 · Pull Request #4600 · InternLM/lmdeploy

43758726 · 2026-05-19T15:22:17Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

Copilot

Pull request overview

This PR updates LMDeploy Lite quantization/calibration paths to better support Qwen3.5 / InternS2Preview architectures and to improve AWQ usability (including a “data-free” mode), alongside a small VLM utility update and a batch-splitting fix.

Changes:

Add InternS2Preview/Qwen3.5 model build support in the VLM wrapper and fix batch splitting for Qwen3.5 position_embeddings.
Introduce lmdeploy.lite.model registry-based per-architecture helpers to drive skip patterns (and some MoE parameter rewrites), and propagate skip lists into quantization_config.
Refactor calibration loading to return the resolved HF architecture and add calib_samples=0 flow for data-free AWQ.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
lmdeploy/vl/model/qwen3_5.py	Adds `build_model()` handling for Qwen3.5 and InternS2Preview VLM variants.
lmdeploy/lite/utils/batch_split.py	Adjusts splitting logic for Qwen3.5 `position_embeddings` tuple layout.
lmdeploy/lite/quantization/awq.py	Adds new skip-pattern plumbing and changes skip logic; extends layernorm mapping for Qwen3 MoE.
lmdeploy/lite/model/base.py	Introduces `MODELS` registry base helper for model-specific quantization support.
lmdeploy/lite/model/qwen.py	Registers Qwen3/Qwen3.5/InternS2Preview skip patterns and MoE conversion helper.
lmdeploy/lite/model/mixtral.py	Registers Mixtral helper and version-dependent skip patterns.
lmdeploy/lite/model/init.py	Initializes Lite model registry and imports registered helpers.
lmdeploy/lite/apis/smooth_quant.py	Threads `trust_remote_code`, consumes new calibrate return shape, and writes `modules_to_not_convert`.
lmdeploy/lite/apis/calibrate.py	Refactors model/tokenizer loading, expands supported model maps, and returns `arch`.
lmdeploy/lite/apis/auto_awq.py	Adds `calib_samples=0` data-free mode and uses per-arch helpers/skip list propagation.
lmdeploy/cli/utils.py	Updates CLI help text to document `--calib-samples 0`.
lmdeploy/archs.py	Removes workspace (TurboMind converted model) shortcut from `get_task()`.

Comments suppressed due to low confidence (1)

lmdeploy/archs.py:146

get_task() no longer handles local TurboMind converted/workspace model directories (typically containing triton_models/weights). Without this short-circuit, calling get_task() on a converted TurboMind model path will fall through to get_model_arch() and likely fail because there is no HF config to load. Please restore the workspace detection (or add equivalent handling in get_model_arch()) so converted TurboMind models continue to be recognized correctly.

def get_task(backend: str, model_path: str, trust_remote_code: bool = False):
    """Get pipeline type and pipeline class from model config."""
    from lmdeploy.serve.core import AsyncEngine

    _, config = get_model_arch(model_path, trust_remote_code=trust_remote_code)
    if check_vl_llm(backend, config.to_dict()):
        from lmdeploy.serve.core import VLAsyncEngine
        return 'vlm', VLAsyncEngine

    # default task, pipeline_class
    return 'llm', AsyncEngine

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

    'Qwen2ForCausalLM': 'Qwen2DecoderLayer',
    'Qwen3ForCausalLM': 'Qwen3DecoderLayer',
+    'Qwen3MoeForCausalLM': 'Qwen3MoeDecoderLayer',
+    'Qwen3_5ForConditionalGeneration': 'Qwen3_5DecoderLayer',
+    'Qwen3_5MoeForConditionalGeneration': 'Qwen3_5MoeDecoderLayer',


    'LlavaLlamaForCausalLM': 'LlamaDecoderLayer',
    'MGMLlamaForCausalLM': 'LlamaDecoderLayer',  # mini gemini
    'InternLMXComposer2ForCausalLM': 'InternLM2DecoderLayer',
+    'InternS2PreviewForConditionalGeneration': 'InternS2PreviewDecoderLayer',


+    'Qwen3MoeDecoderLayer': {
+        'input_layernorm': ['self_attn.k_proj', 'self_attn.q_proj', 'self_attn.v_proj'],
+        'post_attention_layernorm': ['mlp.gate_proj', 'mlp.up_proj']
+    },


+    """
+
+    patterns.extend(SKIPPED_MODULE)

-def skipped_module(name: str):
-    """Whether the module should be skipped from quantization."""
-    for m in SKIPPED_MODULE:
-        if m in name:
-            return True
-    return False
+    return next(((True, pattern) for pattern in patterns if pattern in name), (False, None))


+def get_task(backend: str, model_path: str):
+    """Get pipeline type and pipeline class from model config."""
+
+    _, config = get_model_arch(model_path)


-        torch.cuda.empty_cache()
+    patterns = []
+    skipped_modules = []
+    arch = model.config.architectures[0]


+
+    @classmethod
+    def skipped_modules(cls):
+        pass


43758726 added 9 commits April 28, 2026 16:00

Add Qwen3.5 Moe lite awq

ea1fe52

Fix bug

9507249

convert Qwen3.5 model for vision model

935ba63

Merge branch 'main' into InternS2_preview_lite_awq

74687f5

Improve the convert and skipped_module parts in lite module

9fdbb3a

Merge branch 'main' into InternS2_preview_lite_awq

dc3fa5d

fix suggestions

dd759a2

remove commented-out code

be20b6e

InternS2_preview_lite_awq_fix_bug

8e6af69

Copilot AI review requested due to automatic review settings May 19, 2026 15:22

Copilot started reviewing on behalf of 43758726 May 19, 2026 15:23 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Merge branch 'main' into InternS2_preview_lite_awq_fix_bug

a7a6068

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intern s2 preview lite awq fix bug#4600

Intern s2 preview lite awq fix bug#4600
43758726 wants to merge 10 commits into
InternLM:mainfrom
43758726:InternS2_preview_lite_awq_fix_bug

43758726 commented May 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

43758726 commented May 19, 2026

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants