-
Notifications
You must be signed in to change notification settings - Fork 290
cherry pick Habana software v1.23.0 #2380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Select non scalar MOE
Enabling scale calculation CGUID for static quantization, when calculating weight scale. A flow that doesn't pass in CGUID requires to divide maxabs in fullscale and backoff factor In PTS there's a cast to hp_dtype. Note: In test_qdq there was a type mismatch that required explicitly casting to hp_dtype in CGUID call Co-authored-by: linoy buchnik <lbuchnik@habana.ai>
In the input, we use zero tokens for padding. After the linear layer, we set the corresponding positions (from the padding) to -inf, so that the softmax outputs values close to epsilon. When using FSDPA optimization, to improve performance, we avoid copying the -inf values to the softmax and instead set them directly to zero. As a result, the softmax output becomes exactly zero (as opposed to a small epsilon value without the FSDP optimization). When computing the dynamic scale for the out_proj, this leads to a division by zero issue. The fix we're implementing is to use max(epsilon, scale) during scale calc. This fix aligns non-CGUID code to act the same as the CGUID flow
… also in ops_quantizer (#248) This prevents quantizing dynamically unsupported ops implicitly
[SW-237232] add support in SGLang (#291)
[SW-237037] add support for BLOCK_SOFTMAX_CONST_MAX (#292)
Co-authored-by: Kamil Kaczor <kkaczor@habana.ai>
* [SW-239679] temporary disable deprecated import * Update correct import path Co-authored-by: Xin He <xin3.he@intel.com> * DIsable also auto round tests --------- Co-authored-by: Xin He <xin3.he@intel.com>
* [PERFC-270] add xpu qdq tests using inc * [PERFC-270] - add xfail markers to currently unsupported tests
* update 1 element tensor as scalar Change-Id: I0920bf38ab6de1d8940292773062be9d1de21858 Signed-off-by: Yi Liu <yiliu4@habana.ai> * clean code Change-Id: I744ab33f7ce4711d0968589f13f672d09f22bca6 Signed-off-by: Yi Liu <yiliu4@habana.ai> * fix Change-Id: Ic6ee7f38d4b911247c3727fdcc739030f65ace49 Signed-off-by: Yi Liu <yiliu4@habana.ai> * refine Change-Id: I0492c7d5ddb3b257bf7c550e31bc8a38c7230d08 Signed-off-by: Yi Liu <yiliu4@habana.ai> * update doc Change-Id: I8e8228a1d2948807f2574c418584e911eba8d949 Signed-off-by: Yi Liu <yiliu4@habana.ai> --------- Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>
* pass dtype to scalar --------- Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>
Change-Id: I47f5259a247bbce0c6290d1d1d1bb47071bd3256 Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>
* add VllmMixtureOfExpertsOpFP8PerChannel and refine check --------- Signed-off-by: yiliu30 <yi4.liu@intel.com> Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>
Support deleting of MoE high precsion weights This solves OOM issues in large model with MoE
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
Failed to generate code suggestions for PR |
|
Hi @linoybu Feel free to drop any review comments! 😀 |
yiliu30
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thuang6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this "[SW-239679] temporary fix for static quant test (#298)" commit has no file change?
https://github.com/habana-internal/neural-compressor-fork/pull/298 |
Signed-off-by: xinhe3 <xinhe3@habana.ai>
d747226 to
ef55a76
Compare
Signed-off-by: xinhe3 <xinhe3@habana.ai>
553ac4d to
4d2932c
Compare
Signed-off-by: xinhe3 <xinhe3@habana.ai>
4627a5e to
c0e43d7
Compare
PR Type
Enhancement, Bug fix
Description
Added QuantizedHpuBlockSoftmaxConstMax class for handling block softmax operations
Updated scale calculation for FSDPA to prevent division by zero
Changed dynamic quant check to use op string instead of type
Fixed scale calculation flow for CGUID in weight scaling
Diagram Walkthrough
File Walkthrough
1 files
Add Block Softmax and Update Scale Calculation1 files
Update Dynamic Quant Check and Op String Usage16 files