Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running llama3.1 in ollama/langchain fails. #12111

Open
tkarna opened this issue Sep 23, 2024 · 3 comments
Open

Running llama3.1 in ollama/langchain fails. #12111

tkarna opened this issue Sep 23, 2024 · 3 comments
Assignees

Comments

@tkarna
Copy link

tkarna commented Sep 23, 2024

After updating ipex-llm, running llama3.1 through langchain and ollama no longer works.
A simple reproducer:

# pip install langchain langchain_community
from langchain_community.llms import Ollama

# ollama pull llama3.1:70b-instruct-q4_K_M
llm = Ollama(model="llama3.1:70b-instruct-q4_K_M")
response = llm.invoke("What is the capital of France?")
print(response)

Last know working ipex-llm version is 2.2.0b20240826.
Tested on Ubuntu 22.04, oneAPI 2024.02 (intel-basekit 2024.2.1-98) with two Intel(R) Data Center GPU Max 1100 GPUs.

Error message:

[1727090840] warming up the model with an empty run
ollama_llama_server: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_xmx_kernel.cpp:428: auto ggml_sycl_op_sdp_xmx_casual(fp16 *, fp16 *, fp16 *, fp16 *, fp16 *, float *, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, float *, float, bool, sycl::queue &)::(anonymous class)::operator()() const: Assertion `false' failed.
time=2024-09-23T11:27:23.172Z level=INFO source=server.go:629 msg="waiting for server to become available" status="llm server error"
time=2024-09-23T11:27:23.423Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)"
@leonardozcm
Copy link
Contributor

hi, I think we have fix this in latest pr, may you try ipex-llm[cpp] >=2.2.0b20240924 tomorrow?

@tkarna
Copy link
Author

tkarna commented Sep 26, 2024

Thanks, I confirm that the simple example works now. However, when running a larger langchain agents workflow I'm still getting an error:

/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml/src/ggml-backend.c:96: GGML_ASSERT(base != NULL && "backend buffer base cannot be NULL") failed

I'll see if I can make a small reproducer.

@tklengyel
Copy link

I still have this issue using Ollama and Open WebUI with llama3.1 as of 2.2.0b20240927.

ollama_llama_server: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_xmx_kernel.cpp:429: auto ggml_sycl_op_sdp_xmx_casual(fp16 *, fp16 *, fp16 *, fp16 *, fp16 *, float *, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, float *, float, bool, sycl::queue &)::(anonymous class)::operator()() const: Assertion `false' failed.
time=2024-09-27T18:26:03.643-04:00 level=INFO source=server.go:629 msg="waiting for server to become available" status="llm server error"
time=2024-09-27T18:26:03.893-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants