feat: support deepseek-v3.2-Exp for npu. #562

edison240121 · 2025-12-18T04:37:52Z

No description provided.

XuZhang99 · 2025-12-18T06:44:11Z

xllm/core/framework/batch/batch_input_builder.cpp

+  std::memcpy(cum_vec.data(),
+              cum_tensor.data_ptr<int>(),
+              cum_tensor.numel() * sizeof(int));
+  ;


what are you doing in line 661-673?
why not get std::vector<int32_t> q_cu_seq_lens from std::vector<int32_t> q_seq_lens directly?

XuZhang99 · 2025-12-18T06:47:03Z

xllm/models/llm/npu/deepseek_v3.h


  SET_ARG(stop_token_ids, std::unordered_set<int32_t>({1}));
 });
-}  // namespace xllm


why change this line?

XuZhang99 · 2025-12-18T06:47:54Z

xllm/models/llm/npu/deepseek_v32.h

+#include <torch/torch.h>
+
+#include <boost/algorithm/string.hpp>
+#include <string>


remove:

#include <gflags/gflags.h> #include <boost/algorithm/string.hpp>

XuZhang99 · 2025-12-18T06:48:24Z

xllm/core/layers/npu/loader/deepseek_v2_decoder_loader.cpp

 }

 }  // namespace layer
-}  // namespace xllm


why change this line?

Restored to the state before modification.

DongheJin · 2025-12-18T06:53:05Z

xllm/core/layers/npu/npu_deepseek_v32_decoder_layer_impl.h

+  void initialize_quantization_parameters(
+      atb_speed::deepseekV2::DecoderLayerParam& param);
+
+  void initialize_kimi_k2_parameters(


why need initialize kimi_k2 parameters？

DragonFive · 2025-12-18T06:18:20Z

xllm/core/framework/batch/batch_input_builder.cpp

+  std::memcpy(cum_vec.data(),
+              cum_tensor.data_ptr<int>(),
+              cum_tensor.numel() * sizeof(int));
+  ;


; is redundant

DragonFive · 2025-12-18T06:19:25Z

xllm/core/framework/batch/batch_input_builder.cpp

  raw_forward_input.q_max_seq_len = state_.q_max_seq_len;
  raw_forward_input.seq_lens = std::move(state_.seq_lens);
  raw_forward_input.q_seq_lens = std::move(state_.q_seq_lens);
+  torch::Tensor q_seq_len_tensor =


this torch op may be not fast than std::partial_sum;

Has been changed to use std::partial_sum

yq33victor · 2025-12-19T03:16:17Z

xllm/core/framework/model/model_input_params.h


    params.kv_seq_lens = safe_to(kv_seq_lens, device, true);
    params.q_seq_lens = safe_to(q_seq_lens, device, true);
+    params.q_cu_seq_lens = safe_to(q_cu_seq_lens, device, true);


why we need to use both cu* q_cu_seq_lens and non cu* q_seq_lens params.

DeepSeek V3.2 requires the use of the sparse flash attention operator, where both q_seq_lens and q_cu_seq_lens are inputs to this operator.

yq33victor · 2025-12-19T05:42:25Z

xllm/core/layers/npu/loader/deepseek_v32_decoder_loader.h

+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/


must add #pragma once

yq33victor · 2025-12-19T05:59:07Z

xllm/core/runtime/params_utils.cpp

                           pb_forward_input->q_seq_lens().end());
  // aprint<int32_t>(q_seq_lens, "q_seq_lens", global_rank_);
+  std::vector<int32_t> q_cu_seq_lens(q_seq_lens.size());
+  std::partial_sum(q_seq_lens.begin(), q_seq_lens.end(), q_cu_seq_lens.begin());


nit: please confirm the size of q_cu_seq_lens , in mlu and gpu, the first item is 0, so q_cu_seq_lens.size() == batch_size+1. Here in npu, q_cu_seq_lens.size == batchsize.

The documentation for the sparse flashattention operator requires that the shapes of q_seq_lens and q_cu_seq_lens must be equal.

yq33victor

Merge the code and speed up the deepseek32 testing process.

feat: support deepseek-v3.2-Exp for npu.

5c69383

RobbieLeung requested review from DongheJin, RobbieLeung and yq33victor December 18, 2025 06:09

DragonFive self-requested a review December 18, 2025 06:29

XuZhang99 reviewed Dec 18, 2025

View reviewed changes

DongheJin reviewed Dec 18, 2025

View reviewed changes

edison240121 force-pushed the main branch from e038a34 to ca116c2 Compare December 18, 2025 07:34

DragonFive reviewed Dec 18, 2025

View reviewed changes

edison240121 force-pushed the main branch from c7775f1 to 5c69383 Compare December 18, 2025 09:40

edison240121 and others added 2 commits December 18, 2025 17:43

Merge branch 'main' into main

971086f

refactor: clean code

4d0e002

edison240121 force-pushed the main branch from 730613b to 4d0e002 Compare December 18, 2025 10:52

ext.yanwei25 added 3 commits December 18, 2025 19:48

bugfix: update xllm_kernels version

45aadc4

bugfix: update xllm_ops commit version

e653bf1

refactor: modify the param cum_q_seq_lens to q_cu_seq_lens

3d1770a

yq33victor reviewed Dec 19, 2025

View reviewed changes

bugfix: add #pragma once

7952b39

yq33victor approved these changes Dec 22, 2025

View reviewed changes

RobbieLeung approved these changes Dec 22, 2025

View reviewed changes

feat: support deepseek-v3.2-Exp for npu. #562

Are you sure you want to change the base?

feat: support deepseek-v3.2-Exp for npu. #562

Uh oh!

Conversation

edison240121 commented Dec 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yq33victor left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants