Description
Expected Behavior
I am trying to execute this:
docker run --rm -it -p 8000:8000 -v /home/xxxx/models:/models -e MODEL=/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin ghcr.io/abetlen/llama-cpp-python:latest
and I expect the model to load and server to start. I am using the model quantized by The Bloke according to the current latest specs of llama.ccp ggml implementation
Current Behavior
llama.cpp: loading model from /models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin
Illegal instruction
Environment and Context
Linux DESKTOP-xxx 5.15.68.1-microsoft-standard-WSL2+ #2 SMP
$ python3 3.10.9
$ make GNU Make 4.3
$ g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
docker run --rm -it -p 8000:8000 -v /home/xxxx/models:/models -e MODEL=/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin ghcr.io/abetlen/llama-cpp-python:latest