-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intelanalytics/ipex-llm-inference-cpp-xpu:2.2.0 docker image causes memory issue with intel arc a380 #11993
Comments
|
Hi, yes, a smaller model does work for me on the latest container ~0.3Gb. I think there is an issue though as using version 2.1.0 allows me to use models that match the systems vram ~6gb. Even when I have all other docker containers shut down, when using the new container with ~14Gb of free system memory this error persists. It's possible this is an error in detection of sycl devices as the latest container does not pick up on the CPU either. Although I get high cpu core usage when doing inference on version 2.1.0 using htop, I can also see that hardware acceleration is being utilized by monitoring the GPU usage using intel_gpu_top. I'm not sure how much this means to you. It was working in the previous container, but can't get it to work in 2.2.0+ sticking with 2.1.0 for the time being. |
I don't really know what problem you meet again? Do you mean that this problem exists in the latest 2.2.0 version, and the 2.1.0 is normal? But the docker image is basically not updated between 2.1.0 and 2.2.0. I have tested 2.2.0-snapshot on Arc A770 and no meet any OOM problem. Maybe it's caused by the VRAM different from |
Hi, yes, whilst I can run llms at like 5Gb in size in 2.1.0 I cant run them in 2.2.0 with the exact same docker setup. I can run much smaller llms in 2.2.0 so the ollama functionality is not totally bust, there does seem to be a memory issue. I'm not sure where the issue lies though. Please let me know if there is any other system information that you'd like me to collect to help get to the bottom of this. |
Thanks for your question. There was indeed a llama.cpp/Ollama upgrade between image 2.2.0 and 2.1.0, which may be the root cause. We will confirm the issue again. And You can run it with 2.1.0 first. |
Hi @bobsdacool , in your log it says your
then load model with:
|
Hey. Not a computer scientist here, but thought you guys'd like to know that the latest pushed container image is causing issues with gpu inference for me.
System specs
CPU: AMD Ryzen 3600
GPU: Intel arc a380
RAM: DDR4 ECC RAM unregistered 3200mhz single channel 16gb
OS: Debian 12
Kernel: 6.7.12+bpo-amd64
Docker version 27.2.0, build 3ab4256
logs attached.
Logs_Latest.txt
Logs_2.1.0.txt
The text was updated successfully, but these errors were encountered: