You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After updating ipex-llm, running llama3.1 through langchain and ollama no longer works.
A simple reproducer:
# pip install langchain langchain_communityfromlangchain_community.llmsimportOllama# ollama pull llama3.1:70b-instruct-q4_K_Mllm=Ollama(model="llama3.1:70b-instruct-q4_K_M")
response=llm.invoke("What is the capital of France?")
print(response)
Last know working ipex-llm version is 2.2.0b20240826.
Tested on Ubuntu 22.04, oneAPI 2024.02 (intel-basekit 2024.2.1-98) with two Intel(R) Data Center GPU Max 1100 GPUs.
Error message:
[1727090840] warming up the model with an empty run
ollama_llama_server: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_xmx_kernel.cpp:428: auto ggml_sycl_op_sdp_xmx_casual(fp16 *, fp16 *, fp16 *, fp16 *, fp16 *, float *, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, float *, float, bool, sycl::queue &)::(anonymous class)::operator()() const: Assertion `false' failed.
time=2024-09-23T11:27:23.172Z level=INFO source=server.go:629 msg="waiting for server to become available" status="llm server error"
time=2024-09-23T11:27:23.423Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)"
The text was updated successfully, but these errors were encountered:
After updating ipex-llm, running llama3.1 through langchain and ollama no longer works.
A simple reproducer:
Last know working ipex-llm version is 2.2.0b20240826.
Tested on Ubuntu 22.04, oneAPI 2024.02 (intel-basekit 2024.2.1-98) with two Intel(R) Data Center GPU Max 1100 GPUs.
Error message:
The text was updated successfully, but these errors were encountered: