cherry pick Habana software v1.23.0 #2380

xin3he · 2026-01-06T02:37:28Z

PR Type

Enhancement, Bug fix

Description

Added QuantizedHpuBlockSoftmaxConstMax class for handling block softmax operations
Updated scale calculation for FSDPA to prevent division by zero
Changed dynamic quant check to use op string instead of type
Fixed scale calculation flow for CGUID in weight scaling

Diagram Walkthrough

flowchart LR
  A["Add QuantizedHpuBlockSoftmaxConstMax"] -- "Handle block softmax" --> B["Update FSDPA scale calculation"]
  B -- "Prevent division by zero" --> C["Change dynamic quant check"]
  C -- "Use op string" --> D["Fix CGUID scale calculation"]

File Walkthrough

Relevant files

Enhancement

1 files

hpu_quantized_func_wrapper.py `Add Block Softmax and Update Scale Calculation`	+23/-11

Bug fix

1 files

quantize.py `Update Dynamic Quant Check and Op String Usage`	+2/-2

Additional files

16 files

common.py	+5/-0
external_func_impl.py	+40/-0
fp_utils.py	+3/-3
patching_common.py	+7/-1
quantized_func_wrapper.py	+1/-0
xpu_quantized_func_wrapper.py	+4/-4
scale.py	+1/-1
scale_handler.py	+4/-0
ops_quantizer.py	+27/-21
round_scales_function.py	+2/-2
scales_method.py	+52/-45
utils.py	+1/-1
vllm_functions.py	+0/-32
helper_modules.py	+197/-50
quant_config.py	+4/-4
test_xpu_basic.py	+79/-8

Select non scalar MOE

Enabling scale calculation CGUID for static quantization, when calculating weight scale. A flow that doesn't pass in CGUID requires to divide maxabs in fullscale and backoff factor In PTS there's a cast to hp_dtype. Note: In test_qdq there was a type mismatch that required explicitly casting to hp_dtype in CGUID call Co-authored-by: linoy buchnik <lbuchnik@habana.ai>

In the input, we use zero tokens for padding. After the linear layer, we set the corresponding positions (from the padding) to -inf, so that the softmax outputs values close to epsilon. When using FSDPA optimization, to improve performance, we avoid copying the -inf values to the softmax and instead set them directly to zero. As a result, the softmax output becomes exactly zero (as opposed to a small epsilon value without the FSDP optimization). When computing the dynamic scale for the out_proj, this leads to a division by zero issue. The fix we're implementing is to use max(epsilon, scale) during scale calc. This fix aligns non-CGUID code to act the same as the CGUID flow

… also in ops_quantizer (#248) This prevents quantizing dynamically unsupported ops implicitly

[SW-237232] add support in SGLang (#291)

[SW-237037] add support for BLOCK_SOFTMAX_CONST_MAX (#292)

Co-authored-by: Kamil Kaczor <kkaczor@habana.ai>

* [SW-239679] temporary disable deprecated import * Update correct import path Co-authored-by: Xin He <xin3.he@intel.com> * DIsable also auto round tests --------- Co-authored-by: Xin He <xin3.he@intel.com>

* [PERFC-270] add xpu qdq tests using inc * [PERFC-270] - add xfail markers to currently unsupported tests

* update 1 element tensor as scalar Change-Id: I0920bf38ab6de1d8940292773062be9d1de21858 Signed-off-by: Yi Liu <yiliu4@habana.ai> * clean code Change-Id: I744ab33f7ce4711d0968589f13f672d09f22bca6 Signed-off-by: Yi Liu <yiliu4@habana.ai> * fix Change-Id: Ic6ee7f38d4b911247c3727fdcc739030f65ace49 Signed-off-by: Yi Liu <yiliu4@habana.ai> * refine Change-Id: I0492c7d5ddb3b257bf7c550e31bc8a38c7230d08 Signed-off-by: Yi Liu <yiliu4@habana.ai> * update doc Change-Id: I8e8228a1d2948807f2574c418584e911eba8d949 Signed-off-by: Yi Liu <yiliu4@habana.ai> --------- Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>

…3/r1 (#203)

* pass dtype to scalar --------- Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>

Change-Id: I47f5259a247bbce0c6290d1d1d1bb47071bd3256 Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>

* add VllmMixtureOfExpertsOpFP8PerChannel and refine check --------- Signed-off-by: yiliu30 <yi4.liu@intel.com> Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>

Support deleting of MoE high precsion weights This solves OOM issues in large model with MoE

PRAgent4INC · 2026-01-06T02:38:12Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Naming Consistency The new functions `calculate_scale_maxabs_with_cguid` and `calculate_scale_rounding_with_cguid` have names that include `cguid`, but the old functions `calculate_scale_maxabs` and `calculate_scale_rounding` do not. Ensure that the naming is consistent or that the addition of `cguid` is justified and documented. def calculate_scale_maxabs_with_cguid(x, maxMode, kwargs): return torch.ops.hpu.calculate_scale_for_cast( x, maxMode.value, ScaleCalculationRoundingMode.NO_SCALE_ROUNDING.value, kwargs ) def calculate_scale_rounding_with_cguid(x, scaleMode, kwargs): return torch.ops.hpu.calculate_scale_for_cast( x, ScaleCalculationMaxMode.NO_MAX_CALCULATION.value, scaleMode.value, kwargs ) Function Renaming Impact The renaming of `calc_maxabs_scale` to `calc_scale_from_maxabs` could affect other parts of the codebase that rely on the original function name. Verify that all references to the old function name have been updated accordingly. def calc_scale_from_maxabs(xmaxabs, fullscale, backoff=1): scale = xmaxabs / (fullscale * backoff) return scale

PRAgent4INC · 2026-01-06T02:38:35Z

Failed to generate code suggestions for PR

xin3he · 2026-01-06T02:41:06Z

Hi @linoybu Feel free to drop any review comments! 😀

yiliu30

LGTM

thuang6

why this "[SW-239679] temporary fix for static quant test (#298)" commit has no file change?

xin3he · 2026-01-08T02:25:12Z

why this "[SW-239679] temporary fix for static quant test (#298)" commit has no file change?

https://github.com/habana-internal/neural-compressor-fork/pull/298
Thanks for raising that. the cherry-pick is empty because that one fix is already in INC, and another file: test_autoround.py was moved to another place now.
The cherry-pick is not finished yet. Please expect more changes happen and hopefully we can enable the test_autoround.py back.

Signed-off-by: xinhe3 <xinhe3@habana.ai>

ulivne and others added 16 commits January 5, 2026 09:54

[SW-229385] Select correct MOE op for H2D scales (#280)

37fe55e

Select non scalar MOE

[SW-230070] Changed dynamic quant check to op string instead of type,…

08067ce

… also in ops_quantizer (#248) This prevents quantizing dynamically unsupported ops implicitly

[SW-237232] add support in SGLang (#291)

a47b0ac

[SW-237232] add support in SGLang (#291)

[SW-237037] add support for BLOCK_SOFTMAX_CONST_MAX (#292)

34629a6

[SW-237037] add support for BLOCK_SOFTMAX_CONST_MAX (#292)

Fix get calls (#295)

a6c39b6

Co-authored-by: Kamil Kaczor <kkaczor@habana.ai>

[SW-239679] temporary fix for static quant test (#298)

b57af7f

* [SW-239679] temporary disable deprecated import * Update correct import path Co-authored-by: Xin He <xin3.he@intel.com> * DIsable also auto round tests --------- Co-authored-by: Xin He <xin3.he@intel.com>

[PERFC-270] add xpu qdq tests using inc (#296)

880d114

* [PERFC-270] add xpu qdq tests using inc * [PERFC-270] - add xfail markers to currently unsupported tests

[SW-239033] Using cguid to calculate maxabs in dynamic quant only (#297)

22289e3

Add support for FP8 static quantization for optimum-habana deepseek v…

8679b38

…3/r1 (#203)

Pass dtype to torch scalar (#300)

01ba561

* pass dtype to scalar --------- Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>

pass extra args to moe (#305)

9202f2e

Change-Id: I47f5259a247bbce0c6290d1d1d1bb47071bd3256 Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>

[SW-240561]Requant LLMC FP8 model (#304)

2a8643f

* add VllmMixtureOfExpertsOpFP8PerChannel and refine check --------- Signed-off-by: yiliu30 <yi4.liu@intel.com> Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>

[SW-241261] Support SharedFusedMoE (#313)

4502d8c

Support deleting of MoE high precsion weights This solves OOM issues in large model with MoE

xin3he requested review from mengniwang95, thuang6 and yiliu30 January 6, 2026 02:37

PRAgent4INC added the Review effort 4/5 label Jan 6, 2026

xin3he requested a review from XuehaoSun January 6, 2026 02:41

xin3he added this to the 3.7.1 milestone Jan 6, 2026

yiliu30 approved these changes Jan 7, 2026

View reviewed changes

thuang6 reviewed Jan 7, 2026

View reviewed changes

add back missed change in [SW-234750]

2b42de5

Signed-off-by: xinhe3 <xinhe3@habana.ai>

xin3he force-pushed the xinhe/cherry-pick-v1.23.0 branch from d747226 to ef55a76 Compare January 8, 2026 05:39

enable autoround test

543962e

Signed-off-by: xinhe3 <xinhe3@habana.ai>

xin3he force-pushed the xinhe/cherry-pick-v1.23.0 branch 2 times, most recently from 553ac4d to 4d2932c Compare January 9, 2026 06:06

[GAUDISW-245272] disable layer-wise test

c0e43d7

Signed-off-by: xinhe3 <xinhe3@habana.ai>

xin3he force-pushed the xinhe/cherry-pick-v1.23.0 branch from 4627a5e to c0e43d7 Compare January 9, 2026 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cherry pick Habana software v1.23.0 #2380

cherry pick Habana software v1.23.0 #2380

xin3he commented Jan 6, 2026 •

edited by PRAgent4INC

Loading

Uh oh!

PRAgent4INC commented Jan 6, 2026

Uh oh!

PRAgent4INC commented Jan 6, 2026

Uh oh!

xin3he commented Jan 6, 2026

Uh oh!

yiliu30 left a comment

Uh oh!

thuang6 left a comment

Uh oh!

xin3he commented Jan 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

cherry pick Habana software v1.23.0 #2380

Are you sure you want to change the base?

cherry pick Habana software v1.23.0 #2380

Conversation

xin3he commented Jan 6, 2026 • edited by PRAgent4INC Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

PRAgent4INC commented Jan 6, 2026

PR Reviewer Guide 🔍

Uh oh!

PRAgent4INC commented Jan 6, 2026

Uh oh!

xin3he commented Jan 6, 2026

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

thuang6 left a comment

Choose a reason for hiding this comment

Uh oh!

xin3he commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

xin3he commented Jan 6, 2026 •

edited by PRAgent4INC

Loading

xin3he commented Jan 8, 2026 •

edited

Loading