Open
Description
LocalAI version:
docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:v2.11.0-aio-gpu-nvidia-cuda-12
Environment, CPU architecture, OS, and Version:
Linux rumpel 6.8.2-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 28 Mar 2024 17:06:35 +0000 x86_64 GNU/Linux
NVIDIA GPU detected
Tue Apr 2 12:23:11 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1060 6GB Off | 00000000:01:00.0 On | N/A |
| 0% 58C P8 13W / 120W | 608MiB / 6144MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
NVIDIA GPU detected. Attempting to find memory size...
Total GPU Memory: 6144 MiB
Describe the bug
I am trying to run the example but am not getting an answer but an error:
$ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }'
{"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}}
Expected behavior
An answer instead of an error.
Logs
In the docker console I see this:
12:26PM INF Trying to load the model '5c7cd056ecf9a4bb5b527410b97f48cb' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/vall-e-x/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/exllama/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/sentencetransformers/run.sh
12:26PM INF [llama-cpp] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-cpp
12:26PM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc =
12:26PM INF [llama-ggml] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-ggml
12:26PM INF [llama-ggml] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
12:26PM INF [gpt4all] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend gpt4all
12:26PM INF [gpt4all] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
12:26PM INF [bert-embeddings] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend bert-embeddings
12:26PM INF [bert-embeddings] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
12:26PM INF [rwkv] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend rwkv
12:26PM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
12:26PM INF [whisper] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend whisper
12:26PM INF [whisper] Fails: could not load model: rpc error: code = Unknown desc = unable to load model
12:26PM INF [stablediffusion] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend stablediffusion
12:26PM INF [stablediffusion] Loads OK
Additional context
It seems like it can't load the model. But I have no idea why.
Some side nodes:
- your docs should be updated, the latest tags don't work (see ci: latest image tags #1906 and Dockerhub images referenced in the documentation don't exist #1898) and I am not sure if the tag I used above is actually the one I should use. I also tried master but had the same errrors
- i find it odd that you map open source model names to propriety names like gpt-4. Or maybe that's the actual issue and I need to specify some different model name? Which one?