Skip to content

Conversation

@yao-fengchen
Copy link
Collaborator

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes accuracy issues in the dlinfer backend by improving numerical precision in rotary embeddings, updating tensor type conversions, and upgrading dependencies to stable releases.

  • Refactored rotary embedding inverse frequency calculation to use native float types and eliminate unnecessary type conversions
  • Updated Ascend NPU backend tensor operations to use explicit int32 conversions for improved compatibility
  • Added support for grouped MoE routing with n_groups parameter
  • Upgraded CANN and torch-npu dependencies from release candidates to stable versions

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
lmdeploy/pytorch/backends/dlinfer/rotary_embedding.py Improved numerical precision by changing base parameter to float and refactoring inv_freq calculation to avoid intermediate int64 conversions
lmdeploy/pytorch/backends/dlinfer/moe.py Added n_groups parameter support to align with base SoftmaxTopKBuilder interface
lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py Updated tensor type conversions to explicitly use int32 for Ascend NPU compatibility
docker/Dockerfile_ascend_a3 Upgraded CANN from 8.3.rc1.alpha002 to 8.3.rc1 and torch-npu from 2.8.0rc1 to 2.8.0
docker/Dockerfile_ascend_a2_300i Upgraded CANN from 8.3.rc1.alpha002 to 8.3.rc1 and torch-npu from 2.8.0rc1 to 2.8.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

kv_seqlens = kv_seqlens.repeat_interleave(step_context.q_seqlens, 0)
if not is_unpaged_prefill and AscendOpsBackend.enable_aclgraph():
kv_seqlens = kv_seqlens.cpu().tolist()
kv_seqlens = kv_seqlens.cpu().to(torch.int32)
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable kv_seqlens is already converted to CPU and torch.int32 type at line 254. This redundant conversion at line 273 (.cpu().to(torch.int32)) is unnecessary since kv_seqlens is already on CPU and in int32 format. The original .tolist() conversion was likely intended for a specific purpose, but if the change is intentional, the redundant .cpu() call can be removed.

Suggested change
kv_seqlens = kv_seqlens.cpu().to(torch.int32)
pass

Copilot uses AI. Check for mistakes.
"""Base rotary embedding."""

def __init__(self, dim: int, base: int = 10000, scaling_factor: float = 1.0):
def __init__(self, dim: int, base: float = 10000.0, scaling_factor: float = 1.0):
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base parameter type was changed to float in the base class DlinferRotaryEmbeddingImpl, but subclasses DlinferLlama3RotaryEmbeddingImpl (line 101), DlinferYarnRotaryEmbeddingImpl (line 133), and the builder DlinferRotaryEmbeddingBuilder.build (line 154) still declare base: int = 10000. This creates an API inconsistency where the parent class expects a different type than its subclasses and builder.

All subclasses and the builder should be updated to use base: float = 10000.0 to match the parent class signature.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant