Skip to content

rpc error: code = Unknown desc = unimplemented instead of chat completion #1946

Open
@splitbrain

Description

@splitbrain

LocalAI version:

docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:v2.11.0-aio-gpu-nvidia-cuda-12

Environment, CPU architecture, OS, and Version:

Linux rumpel 6.8.2-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 28 Mar 2024 17:06:35 +0000 x86_64 GNU/Linux

NVIDIA GPU detected
Tue Apr  2 12:23:11 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    Off |   00000000:01:00.0  On |                  N/A |
|  0%   58C    P8             13W /  120W |     608MiB /   6144MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
NVIDIA GPU detected. Attempting to find memory size...
Total GPU Memory: 6144 MiB

Describe the bug

I am trying to run the example but am not getting an answer but an error:

$ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }'
{"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}}

Expected behavior

An answer instead of an error.

Logs

In the docker console I see this:

12:26PM INF Trying to load the model '5c7cd056ecf9a4bb5b527410b97f48cb' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/vall-e-x/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/exllama/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/sentencetransformers/run.sh
12:26PM INF [llama-cpp] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-cpp
12:26PM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc = 
12:26PM INF [llama-ggml] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-ggml
12:26PM INF [llama-ggml] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
12:26PM INF [gpt4all] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend gpt4all
12:26PM INF [gpt4all] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
12:26PM INF [bert-embeddings] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend bert-embeddings
12:26PM INF [bert-embeddings] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
12:26PM INF [rwkv] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend rwkv
12:26PM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
12:26PM INF [whisper] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend whisper
12:26PM INF [whisper] Fails: could not load model: rpc error: code = Unknown desc = unable to load model
12:26PM INF [stablediffusion] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend stablediffusion
12:26PM INF [stablediffusion] Loads OK

Additional context

It seems like it can't load the model. But I have no idea why.

Some side nodes:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions