Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inference] Support groupwise mode of gemv kernel #60204

Merged
merged 13 commits into from
Dec 27, 2023

Conversation

freeliuzc
Copy link
Contributor

@freeliuzc freeliuzc commented Dec 20, 2023

PR types

New features

PR changes

OPs

Description

  1. Support wint4 gemv func with groupwise
  2. Modify weight_only_linear
  3. Support groupwise mode for weight_dequantize and weight_quantize in int4 format

PCard-77383

Copy link

paddle-bot bot commented Dec 20, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@@ -24,6 +24,8 @@ limitations under the License. */
#include "paddle/phi/common/float16.h"
#include "paddle/phi/core/kernel_registry.h"

// #define _DEBUG_WEIGHT_ONLY_GEMV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个需要保留吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以删掉~

int initial_offset,
int stride)
: _scales(scales), _zeros(zeros), _stride(stride) {
_scales += initial_offset;
#ifndef WIN32
// linux
if constexpr (Zero) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个没必要删掉吧?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里加上,paddle 的 pre-commit 会报错,不让提交,我再试下是否有其他规避的方法

struct WeightOnlyConverter {};

template <>
struct WeightOnlyConverter<half, WeightOnlyQuantType::Int8b> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的功能和fast_cvt_4_packed_signed_i8s_to_2_half2s是不是一样的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是一样的,这里为了 int4/int8 写在一起,包了一层 struct~

for (int p = 0; p < 16; ++p) {
weights_f16[p * NPerBlock + idx] =
weights_vec[p / 8 + (p % 8) * 2] * scale[idx];
#ifdef _DEBUG_WEIGHT_ONLY_GEMV
Copy link
Contributor

@wwbitejotunn wwbitejotunn Dec 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug信息就不保留了吧?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,下个 commit 删除掉

heavengate
heavengate previously approved these changes Dec 25, 2023
Copy link
Contributor

@heavengate heavengate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for group_size attr

@jeff41404
Copy link
Contributor

Generally speaking, adding parameters to operator needs to update checkpoint information in paddle/phi/api/yaml/op_version.yaml, @heavengate need to pay attention to inference compatibility risks.

qili93
qili93 previously approved these changes Dec 25, 2023
Copy link
Contributor

@qili93 qili93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for unittest.skipIf

yuanlehome
yuanlehome previously approved these changes Dec 25, 2023
@freeliuzc freeliuzc dismissed stale reviews from yuanlehome, qili93, and heavengate via 97e9192 December 25, 2023 15:49
@freeliuzc
Copy link
Contributor Author

Generally speaking, adding parameters to operator needs to update checkpoint information in paddle/phi/api/yaml/op_version.yaml, @heavengate need to pay attention to inference compatibility risks.

已添加

jeff41404
jeff41404 previously approved these changes Dec 26, 2023
Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Ligoml
Ligoml previously approved these changes Dec 26, 2023
Copy link
Contributor

@Ligoml Ligoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs

yuanlehome
yuanlehome previously approved these changes Dec 26, 2023
Copy link
Contributor

@yuanlehome yuanlehome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@freeliuzc freeliuzc dismissed stale reviews from Ligoml and jeff41404 via ca7a061 December 26, 2023 07:16
Copy link
Contributor

@heavyrain-lzy heavyrain-lzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for YAML

Copy link
Contributor

@qili93 qili93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for unittest.skipIf

@yuanlehome yuanlehome merged commit 22b49df into PaddlePaddle:develop Dec 27, 2023
29 checks passed
Wanglongzhi2001 pushed a commit to Wanglongzhi2001/Paddle that referenced this pull request Jan 7, 2024
* support gemv-groupwise func && weightQuanter-groupwise && weightDeQuanter-groupwise

* fix build bug

* add unit_test && fix bug

* delete useless code

* fix ci build bug

* fix ci && optimize

* fix merge conflict

* add op change info

* fix weight_only_linear_pass

* fix format

* solve ci unit_test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants