Skip to content

Intern s2 preview lite awq fix bug#4600

Open
43758726 wants to merge 10 commits into
InternLM:mainfrom
43758726:InternS2_preview_lite_awq_fix_bug
Open

Intern s2 preview lite awq fix bug#4600
43758726 wants to merge 10 commits into
InternLM:mainfrom
43758726:InternS2_preview_lite_awq_fix_bug

Conversation

@43758726
Copy link
Copy Markdown
Collaborator

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

Copilot AI review requested due to automatic review settings May 19, 2026 15:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates LMDeploy Lite quantization/calibration paths to better support Qwen3.5 / InternS2Preview architectures and to improve AWQ usability (including a “data-free” mode), alongside a small VLM utility update and a batch-splitting fix.

Changes:

  • Add InternS2Preview/Qwen3.5 model build support in the VLM wrapper and fix batch splitting for Qwen3.5 position_embeddings.
  • Introduce lmdeploy.lite.model registry-based per-architecture helpers to drive skip patterns (and some MoE parameter rewrites), and propagate skip lists into quantization_config.
  • Refactor calibration loading to return the resolved HF architecture and add calib_samples=0 flow for data-free AWQ.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
lmdeploy/vl/model/qwen3_5.py Adds build_model() handling for Qwen3.5 and InternS2Preview VLM variants.
lmdeploy/lite/utils/batch_split.py Adjusts splitting logic for Qwen3.5 position_embeddings tuple layout.
lmdeploy/lite/quantization/awq.py Adds new skip-pattern plumbing and changes skip logic; extends layernorm mapping for Qwen3 MoE.
lmdeploy/lite/model/base.py Introduces MODELS registry base helper for model-specific quantization support.
lmdeploy/lite/model/qwen.py Registers Qwen3/Qwen3.5/InternS2Preview skip patterns and MoE conversion helper.
lmdeploy/lite/model/mixtral.py Registers Mixtral helper and version-dependent skip patterns.
lmdeploy/lite/model/init.py Initializes Lite model registry and imports registered helpers.
lmdeploy/lite/apis/smooth_quant.py Threads trust_remote_code, consumes new calibrate return shape, and writes modules_to_not_convert.
lmdeploy/lite/apis/calibrate.py Refactors model/tokenizer loading, expands supported model maps, and returns arch.
lmdeploy/lite/apis/auto_awq.py Adds calib_samples=0 data-free mode and uses per-arch helpers/skip list propagation.
lmdeploy/cli/utils.py Updates CLI help text to document --calib-samples 0.
lmdeploy/archs.py Removes workspace (TurboMind converted model) shortcut from get_task().
Comments suppressed due to low confidence (1)

lmdeploy/archs.py:146

  • get_task() no longer handles local TurboMind converted/workspace model directories (typically containing triton_models/weights). Without this short-circuit, calling get_task() on a converted TurboMind model path will fall through to get_model_arch() and likely fail because there is no HF config to load. Please restore the workspace detection (or add equivalent handling in get_model_arch()) so converted TurboMind models continue to be recognized correctly.
def get_task(backend: str, model_path: str, trust_remote_code: bool = False):
    """Get pipeline type and pipeline class from model config."""
    from lmdeploy.serve.core import AsyncEngine

    _, config = get_model_arch(model_path, trust_remote_code=trust_remote_code)
    if check_vl_llm(backend, config.to_dict()):
        from lmdeploy.serve.core import VLAsyncEngine
        return 'vlm', VLAsyncEngine

    # default task, pipeline_class
    return 'llm', AsyncEngine

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 19 to +23
'Qwen2ForCausalLM': 'Qwen2DecoderLayer',
'Qwen3ForCausalLM': 'Qwen3DecoderLayer',
'Qwen3MoeForCausalLM': 'Qwen3MoeDecoderLayer',
'Qwen3_5ForConditionalGeneration': 'Qwen3_5DecoderLayer',
'Qwen3_5MoeForConditionalGeneration': 'Qwen3_5MoeDecoderLayer',
'LlavaLlamaForCausalLM': 'LlamaDecoderLayer',
'MGMLlamaForCausalLM': 'LlamaDecoderLayer', # mini gemini
'InternLMXComposer2ForCausalLM': 'InternLM2DecoderLayer',
'InternS2PreviewForConditionalGeneration': 'InternS2PreviewDecoderLayer',
'Qwen3MoeDecoderLayer': {
'input_layernorm': ['self_attn.k_proj', 'self_attn.q_proj', 'self_attn.v_proj'],
'post_attention_layernorm': ['mlp.gate_proj', 'mlp.up_proj']
},
Comment on lines +137 to +141
"""

patterns.extend(SKIPPED_MODULE)

def skipped_module(name: str):
"""Whether the module should be skipped from quantization."""
for m in SKIPPED_MODULE:
if m in name:
return True
return False
return next(((True, pattern) for pattern in patterns if pattern in name), (False, None))
Comment on lines +183 to +186
def get_task(backend: str, model_path: str):
"""Get pipeline type and pipeline class from model config."""

_, config = get_model_arch(model_path)
torch.cuda.empty_cache()
patterns = []
skipped_modules = []
arch = model.config.architectures[0]

@classmethod
def skipped_modules(cls):
pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants