rpc error: code = Unknown desc = unimplemented instead of chat completion

**LocalAI version:**

docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:v2.11.0-aio-gpu-nvidia-cuda-12

**Environment, CPU architecture, OS, and Version:**

Linux rumpel 6.8.2-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 28 Mar 2024 17:06:35 +0000 x86_64 GNU/Linux

```
NVIDIA GPU detected
Tue Apr  2 12:23:11 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    Off |   00000000:01:00.0  On |                  N/A |
|  0%   58C    P8             13W /  120W |     608MiB /   6144MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
NVIDIA GPU detected. Attempting to find memory size...
Total GPU Memory: 6144 MiB
```

**Describe the bug**

I am trying to run the example but am not getting an answer but an error:

```
$ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }'
{"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}}
```

**Expected behavior**

An answer instead of an error.

**Logs**

In the docker console I see this:

```
12:26PM INF Trying to load the model '5c7cd056ecf9a4bb5b527410b97f48cb' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/vall-e-x/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/exllama/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/sentencetransformers/run.sh
12:26PM INF [llama-cpp] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-cpp
12:26PM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc = 
12:26PM INF [llama-ggml] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-ggml
12:26PM INF [llama-ggml] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
12:26PM INF [gpt4all] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend gpt4all
12:26PM INF [gpt4all] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
12:26PM INF [bert-embeddings] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend bert-embeddings
12:26PM INF [bert-embeddings] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
12:26PM INF [rwkv] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend rwkv
12:26PM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
12:26PM INF [whisper] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend whisper
12:26PM INF [whisper] Fails: could not load model: rpc error: code = Unknown desc = unable to load model
12:26PM INF [stablediffusion] Attempting to load
12:26PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend stablediffusion
12:26PM INF [stablediffusion] Loads OK
```

**Additional context**

It seems like it can't load the model. But I have no idea why. 

Some side nodes:

* your docs should be updated, the latest tags don't work (see #1906 and #1898) and I am not sure if the tag I used above is actually the one I should use. I also tried master but had the same errrors
* i find it odd that you map open source model names to propriety names like gpt-4. Or maybe that's the actual issue and I need to specify some different model name? Which one?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

rpc error: code = Unknown desc = unimplemented instead of chat completion #1946

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

rpc error: code = Unknown desc = unimplemented instead of chat completion #1946

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions