-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference speed and memory usage of Qwen1.5-14b #12015
Comments
Hi @WeiguangHan , we will take a look at this issue and try to reproduce it first. We'll let you know if there's any progress. |
Hi @WeiguangHan , we can not reproduce the issue on an Ultra 5 125H CPU. The CPU usage when running qwen1.5 example script turned out pretty normal: Also, pls note that it is recommended to run performance evaluation with the
|
Thanks a lot. The CPU of my computer is Ultra 7 155H. It should have a better performance theoretically. I will try it again according to your instructions. |
I have tested the inference speed and memory usage of Qwen1.5-14b on my machine using the example in ipex-llm. The peek cpu usage to load Qwen1.5-14b in 4-bit is about 24GB. The peek GPU usage is about 10GB. The Inference speed is about 4~5 token/s. I export the environment variables
set SYCL_CACHE_PERSISTENT=1
andset BIGDL_LLM_XMX_DISABLED=1
. Does the inference speed and CPU/GPU memory usage meet the expectation? I think the CPU peak usage is too high and the speed is a little slow.device
Intel(R) Core(TM) Ultra 7 155H 3.80 GHz
32.0 GB (31.6 GB 可用)
env
intel-extension-for-pytorch 2.1.10+xpu
torch 2.1.0a0+cxx11.abi
transformers 4.44.2
The text was updated successfully, but these errors were encountered: