-
Notifications
You must be signed in to change notification settings - Fork 626
fix accuracy #4142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix accuracy #4142
Conversation
17c386e to
9ece6a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes accuracy issues in the dlinfer backend by improving numerical precision in rotary embeddings, updating tensor type conversions, and upgrading dependencies to stable releases.
- Refactored rotary embedding inverse frequency calculation to use native float types and eliminate unnecessary type conversions
- Updated Ascend NPU backend tensor operations to use explicit int32 conversions for improved compatibility
- Added support for grouped MoE routing with
n_groupsparameter - Upgraded CANN and torch-npu dependencies from release candidates to stable versions
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| lmdeploy/pytorch/backends/dlinfer/rotary_embedding.py | Improved numerical precision by changing base parameter to float and refactoring inv_freq calculation to avoid intermediate int64 conversions |
| lmdeploy/pytorch/backends/dlinfer/moe.py | Added n_groups parameter support to align with base SoftmaxTopKBuilder interface |
| lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py | Updated tensor type conversions to explicitly use int32 for Ascend NPU compatibility |
| docker/Dockerfile_ascend_a3 | Upgraded CANN from 8.3.rc1.alpha002 to 8.3.rc1 and torch-npu from 2.8.0rc1 to 2.8.0 |
| docker/Dockerfile_ascend_a2_300i | Upgraded CANN from 8.3.rc1.alpha002 to 8.3.rc1 and torch-npu from 2.8.0rc1 to 2.8.0 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| kv_seqlens = kv_seqlens.repeat_interleave(step_context.q_seqlens, 0) | ||
| if not is_unpaged_prefill and AscendOpsBackend.enable_aclgraph(): | ||
| kv_seqlens = kv_seqlens.cpu().tolist() | ||
| kv_seqlens = kv_seqlens.cpu().to(torch.int32) |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable kv_seqlens is already converted to CPU and torch.int32 type at line 254. This redundant conversion at line 273 (.cpu().to(torch.int32)) is unnecessary since kv_seqlens is already on CPU and in int32 format. The original .tolist() conversion was likely intended for a specific purpose, but if the change is intentional, the redundant .cpu() call can be removed.
| kv_seqlens = kv_seqlens.cpu().to(torch.int32) | |
| pass |
| """Base rotary embedding.""" | ||
|
|
||
| def __init__(self, dim: int, base: int = 10000, scaling_factor: float = 1.0): | ||
| def __init__(self, dim: int, base: float = 10000.0, scaling_factor: float = 1.0): |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base parameter type was changed to float in the base class DlinferRotaryEmbeddingImpl, but subclasses DlinferLlama3RotaryEmbeddingImpl (line 101), DlinferYarnRotaryEmbeddingImpl (line 133), and the builder DlinferRotaryEmbeddingBuilder.build (line 154) still declare base: int = 10000. This creates an API inconsistency where the parent class expects a different type than its subclasses and builder.
All subclasses and the builder should be updated to use base: float = 10000.0 to match the parent class signature.
No description provided.