Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate kleidiAI release v0.3.0 into MNN 2.9.6 #2995

Merged
merged 8 commits into from
Oct 28, 2024

Conversation

xhzheng1895
Copy link
Contributor

Put KleidiAI files in folder source/backend/cpu/arm/kleidiAI/kai, download from arm gitlab and remain unchanged. Maybe will remove these files and download them when build.

MNNKleidiAI.cpp is interface between MNN and KleidiAI.

Rewrite function in class DenseConvInt8TiledExecutor , in ConvInt8TiledExecutor.cpp, to call KleidiAI functions. Maybe implement a new execution later.

Changes to GeometryConvUtils.cpp and ShapeTensorConvert.cpp are for the input and output of DenseConvInt8TiledExecutor is NCHW, rather than NC4HW4, to avoid redundant pack/unpack and get better performance.

@CLAassistant
Copy link

CLAassistant commented Aug 16, 2024

CLA assistant check
All committers have signed the CLA.

@wangzhaode
Copy link
Collaborator

在M3芯片上测试了下面的2个模型,结果不正确

https://modelscope.cn/models/zhaode/Qwen2-7B-Instruct-MNN
https://modelscope.cn/models/zhaode/Qwen2-1.5B-Instruct-MNN

@xhzheng1895 xhzheng1895 reopened this Aug 21, 2024
@xhzheng1895
Copy link
Contributor Author

Hi,现在kleidiAI只支持对称量化的模型。
对于非对称量化模型,会走到DenseConvInt8TiledExecutor原本的一些函数里。但是需要把KAI_CONV_NCHW_IN_OUT这个宏关掉,否则输入输出format会和DenseConvInt8TiledExecutor原生的函数不匹配。

@wangzhaode
Copy link
Collaborator

OK测试了一下对称量化的模型没有问题,decode性能相比MNN的原始实现有加速效果
在M3 Pro上测试Qwen2-1.5B-int4, CPU 4线程速度如下:

prefill decode
MNN 330 75
KleidiAI 295 85

@yiyangfan01
Copy link

Here is the perf data I collected with the same model with @wangzhaode on RedMi K60 ultra(MTK D9300 inside), 16GB RAM, 4Threads.
Prefill has 57% improvement, decode has 28% improvement.
image

xhzheng1895 and others added 4 commits October 22, 2024 14:29
Put KleidiAI files in folder source/backend/cpu/arm/kleidiAI/kai,
download from arm gitlab and remain unchanged. Maybe will remove
these files and download them when build.

MNNKleidiAI.cpp is interface between MNN and KleidiAI.

Rewrite function in class DenseConvInt8TiledExecutor
, in ConvInt8TiledExecutor.cpp, to call KleidiAI functions.
Maybe implement a new execution later.

Changes to GeometryConvUtils.cpp and ShapeTensorConvert.cpp are for
the input and output of DenseConvInt8TiledExecutor is NCHW,
rather than NC4HW4, to avoid redundant pack/unpack and get better
performance.
@xhzheng1895 xhzheng1895 changed the title Integrate kleidiAI release v0.1.0 into MNN 2.9.3 Integrate kleidiAI release v0.3.0 into MNN 2.9.6 Oct 22, 2024
@xhzheng1895 xhzheng1895 marked this pull request as ready for review October 22, 2024 07:16
@wangzhaode wangzhaode self-assigned this Oct 28, 2024
@wangzhaode wangzhaode merged commit 630d593 into alibaba:master Oct 28, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants