feat: support Qwen3 128k context via YaRN scaling. #624

yingxudeng · 2025-12-30T17:22:40Z

No description provided.

zhang-minchao · 2026-01-04T09:27:15Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for YaRN scaling to the Qwen3 model, enabling a 128k context length. The changes involve updating the maximum context length calculation, adding a new utility function for YaRN rotary embeddings, and integrating this into the Qwen3 model implementation. The overall approach is sound. I've identified a minor issue regarding an unused function parameter that should be addressed to improve code clarity and maintainability.

gemini-code-assist · 2026-01-04T09:28:29Z

xllm/core/layers/common/rotary_embedding_util.h

+torch::Tensor get_yarn_rotary_embedding(
+    int64_t rotary_dim,
+    int64_t max_position_embeddings,
+    int64_t original_max_position_embeddings,
+    float rope_theta,
+    bool interleaved,
+    float scaling_factor,
+    const torch::TensorOptions& options,
+    float extrapolation_factor = 1.0f,
+    float attn_factor = 1.0f,
+    float beta_fast = 32.0f,
+    float beta_slow = 1.0f);


The max_position_embeddings parameter is unused in the implementation of get_yarn_rotary_embedding. To improve code clarity and prevent confusion, it should be removed from the function declaration. I will suggest corresponding changes in the implementation file and at the call site.

torch::Tensor get_yarn_rotary_embedding( int64_t rotary_dim, int64_t original_max_position_embeddings, float rope_theta, bool interleaved, float scaling_factor, const torch::TensorOptions& options, float extrapolation_factor = 1.0f, float attn_factor = 1.0f, float beta_fast = 32.0f, float beta_slow = 1.0f);

gemini-code-assist · 2026-01-04T09:28:29Z

xllm/core/layers/common/rotary_embedding_util.cpp

+torch::Tensor get_yarn_rotary_embedding(
+    int64_t rotary_dim,
+    int64_t max_position_embeddings,
+    int64_t original_max_position_embeddings,
+    float rope_theta,
+    bool interleaved,
+    float scaling_factor,
+    const torch::TensorOptions& options,
+    float extrapolation_factor,
+    float attn_factor,
+    float beta_fast,
+    float beta_slow) {


The max_position_embeddings parameter is not used within this function. It should be removed from the function signature to align with the declaration change and improve code maintainability.

torch::Tensor get_yarn_rotary_embedding( int64_t rotary_dim, int64_t original_max_position_embeddings, float rope_theta, bool interleaved, float scaling_factor, const torch::TensorOptions& options, float extrapolation_factor, float attn_factor, float beta_fast, float beta_slow) {

gemini-code-assist · 2026-01-04T09:28:29Z

xllm/models/llm/npu/qwen3.h

+      cos_sin_ = layer::rotary::get_yarn_rotary_embedding(
+          128,
+          model_args.max_position_embeddings(),
+          model_args.rope_scaling_original_max_position_embeddings(),
+          model_args.rope_theta(),
+          false,
+          model_args.rope_scaling_factor(),
+          options);


This argument corresponds to the unused max_position_embeddings parameter in get_yarn_rotary_embedding. It should be removed from the function call to match the updated function signature.

cos_sin_ = layer::rotary::get_yarn_rotary_embedding( 128, model_args.rope_scaling_original_max_position_embeddings(), model_args.rope_theta(), false, model_args.rope_scaling_factor(), options);

yingxudeng requested review from liutongxuan and yq33victor December 30, 2025 17:22

yingxudeng requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99 and walsonyang as code owners December 30, 2025 17:22

feat: support Qwen3 128k context via YaRN scaling.

6961655

yingxudeng force-pushed the feat/qwen3_yarn branch from 25765ff to 6961655 Compare December 31, 2025 03:00

gemini-code-assist bot reviewed Jan 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support Qwen3 128k context via YaRN scaling. #624

feat: support Qwen3 128k context via YaRN scaling. #624

yingxudeng commented Dec 30, 2025

Uh oh!

zhang-minchao commented Jan 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 4, 2026

Uh oh!

gemini-code-assist bot Jan 4, 2026

Uh oh!

gemini-code-assist bot Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: support Qwen3 128k context via YaRN scaling. #624

Are you sure you want to change the base?

feat: support Qwen3 128k context via YaRN scaling. #624

Conversation

yingxudeng commented Dec 30, 2025

Uh oh!

zhang-minchao commented Jan 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants