intelanalytics/ipex-llm-inference-cpp-xpu:2.2.0 docker image causes memory issue with intel arc a380 #11993

bobsdacool · 2024-09-02T18:19:51Z

Hey. Not a computer scientist here, but thought you guys'd like to know that the latest pushed container image is causing issues with gpu inference for me.

System specs
CPU: AMD Ryzen 3600
GPU: Intel arc a380
RAM: DDR4 ECC RAM unregistered 3200mhz single channel 16gb
OS: Debian 12
Kernel: 6.7.12+bpo-amd64
Docker version 27.2.0, build 3ab4256

logs attached.
Logs_Latest.txt
Logs_2.1.0.txt

hzjane · 2024-09-03T01:49:27Z

Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY) .This looks like an OOM error. You can try a smaller model such as dolphin-phi:latest.

bobsdacool · 2024-09-03T07:37:18Z

Hi, yes, a smaller model does work for me on the latest container ~0.3Gb. I think there is an issue though as using version 2.1.0 allows me to use models that match the systems vram ~6gb. Even when I have all other docker containers shut down, when using the new container with ~14Gb of free system memory this error persists. It's possible this is an error in detection of sycl devices as the latest container does not pick up on the CPU either. Although I get high cpu core usage when doing inference on version 2.1.0 using htop, I can also see that hardware acceleration is being utilized by monitoring the GPU usage using intel_gpu_top. I'm not sure how much this means to you. It was working in the previous container, but can't get it to work in 2.2.0+ sticking with 2.1.0 for the time being.

hzjane · 2024-09-04T01:38:23Z

I don't really know what problem you meet again? Do you mean that this problem exists in the latest 2.2.0 version, and the 2.1.0 is normal? But the docker image is basically not updated between 2.1.0 and 2.2.0. I have tested 2.2.0-snapshot on Arc A770 and no meet any OOM problem. Maybe it's caused by the VRAM different from A380 6GB and A770 16GB?

bobsdacool · 2024-09-04T07:23:23Z

Hi, yes, whilst I can run llms at like 5Gb in size in 2.1.0 I cant run them in 2.2.0 with the exact same docker setup. I can run much smaller llms in 2.2.0 so the ollama functionality is not totally bust, there does seem to be a memory issue.

I'm not sure where the issue lies though. Please let me know if there is any other system information that you'd like me to collect to help get to the bottom of this.

hzjane · 2024-09-05T01:54:27Z

Thanks for your question. There was indeed a llama.cpp/Ollama upgrade between image 2.2.0 and 2.1.0, which may be the root cause. We will confirm the issue again. And You can run it with 2.1.0 first.

JinheTang · 2024-09-05T02:26:26Z

Hi @bobsdacool , in your log it says your n_ctx = 8192. This is because the latest ollama upstream has a default setting of OLLAMA_NUM_PARALLEL=4, which sets the total space allocated for context n_ctx to 4*2048, 2048 being the model's default context space. Try running export OLLAMA_NUM_PARALLEL=1 before you start ollama serve. If the problem persists, you may manually create a Modelfile and set the model's num_ctx smaller, eg.

FROM llama2
PARAMETER num_ctx 512

then load model with:

ollama create llama2:latest-nctx512 -f Modelfile
ollama run llama2:latest-nctx512

glorysdj assigned hzjane Sep 3, 2024

glorysdj added the user issue label Sep 3, 2024

rnwang04 assigned JinheTang Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intelanalytics/ipex-llm-inference-cpp-xpu:2.2.0 docker image causes memory issue with intel arc a380 #11993

intelanalytics/ipex-llm-inference-cpp-xpu:2.2.0 docker image causes memory issue with intel arc a380 #11993

bobsdacool commented Sep 2, 2024

hzjane commented Sep 3, 2024

bobsdacool commented Sep 3, 2024 •

edited

Loading

hzjane commented Sep 4, 2024 •

edited

Loading

bobsdacool commented Sep 4, 2024

hzjane commented Sep 5, 2024 •

edited

Loading

JinheTang commented Sep 5, 2024 •

edited

Loading

intelanalytics/ipex-llm-inference-cpp-xpu:2.2.0 docker image causes memory issue with intel arc a380 #11993

intelanalytics/ipex-llm-inference-cpp-xpu:2.2.0 docker image causes memory issue with intel arc a380 #11993

Comments

bobsdacool commented Sep 2, 2024

hzjane commented Sep 3, 2024

bobsdacool commented Sep 3, 2024 • edited Loading

hzjane commented Sep 4, 2024 • edited Loading

bobsdacool commented Sep 4, 2024

hzjane commented Sep 5, 2024 • edited Loading

JinheTang commented Sep 5, 2024 • edited Loading

bobsdacool commented Sep 3, 2024 •

edited

Loading

hzjane commented Sep 4, 2024 •

edited

Loading

hzjane commented Sep 5, 2024 •

edited

Loading

JinheTang commented Sep 5, 2024 •

edited

Loading