-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Failure Information (for bugs)
I tried to use the model in text-generation-webui with llamacpp loader before running the example code on the file directly, they both exited with the same error. Is this because of the model's architecture being different from Llama and is not supported(yet)?
Loading is ok, but cannot generate text because of a failed assertion which seemed to be in llama.cpp, the program aborted with an error message:
GGML_ASSERT: /tmp/pip-install-ao0hk6o5/llama-cpp-python_d37ca6975e1d43449c7282c5f7ed9b9b/vendor/llama.cpp/ggml.c:12913: ne1 + n_past == ne0
This gguf file was converted from Baichuan2-13B-Base using the script found in llama.cpp, and it can be used with llama.cpp, too.
Environment and Context
Debian 12, Linux 6.1 on amd64
cpu:13th Gen Intel(R) Core(TM) i9-13900HX
gpu: GeForce RTX 4080 laptop
Python 3.11.2
GNU Make 4.3
g++ 12.2.0
CUDA 12.2
Steps to Reproduce
- install : CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
- Run the example code:
from llama_cpp import Llama
llm = Llama(model_path="./models/baichuan.gguf")
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)