Bug: llama-server crashes when started with --embeddings

### What happened?

System: Mac M1, latest OS (14.5), latest llama.cpp (b3204), build in a std. way (make).
Path to reproduce: start llama server with --embeddings (tested with llama3/8b/fp16 and mistral/7b/Q8_0), go to gui, type anything.
Expected result: system completes/respond.
Actual result: llama-server segfaults: llama_get_logits_ith: invalid logits id 23, reason: no logits / zsh: segmentation fault.
Other notes: The same build/models behave normally when llama-server is started without --embeddings. Similar issue confirmed on Linux/CUDA.

### Name and Version

version: 3204 (45c0e2e4)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.5.0

### What operating system are you seeing the problem on?

Linux, Mac

### Relevant log output

```shell
[...]
llama_kv_cache_init:      Metal KV buffer size =  4096.00 MiB
llama_new_context_with_model: KV self size  = 4096.00 MiB, K (f16): 2048.00 MiB, V (f16): 2048.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.03 MiB
llama_new_context_with_model:      Metal compute buffer size =  2144.00 MiB
llama_new_context_with_model:        CPU compute buffer size =    72.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 2
INFO [                    init] initializing slots | tid="0x1f31c4c00" timestamp=1719140243 n_slots=1
INFO [                    init] new slot | tid="0x1f31c4c00" timestamp=1719140243 id_slot=0 n_ctx_slot=32768
INFO [                    main] model loaded | tid="0x1f31c4c00" timestamp=1719140243
INFO [                    main] chat template | tid="0x1f31c4c00" timestamp=1719140243 chat_example="[INST] You are a helpful assistant\nHello [/INST]Hi there</s>[INST] How are you? [/INST]" built_in=true
INFO [                    main] HTTP server listening | tid="0x1f31c4c00" timestamp=1719140243 port="8080" n_threads_http="19" hostname="127.0.0.1"
INFO [            update_slots] all slots are idle | tid="0x1f31c4c00" timestamp=1719140243
INFO [      log_server_request] request | tid="0x16b85b000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56208 status=200 method="GET" path="/" params={}
INFO [      log_server_request] request | tid="0x16b85b000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56208 status=200 method="GET" path="/index.js" params={}
INFO [      log_server_request] request | tid="0x16b8e7000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56209 status=200 method="GET" path="/completion.js" params={}
INFO [      log_server_request] request | tid="0x16b973000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56211 status=200 method="GET" path="/json-schema-to-grammar.mjs" params={}
INFO [   launch_slot_with_task] slot is processing task | tid="0x1f31c4c00" timestamp=1719140258 id_slot=0 id_task=0
INFO [            update_slots] kv cache rm [p0, end) | tid="0x1f31c4c00" timestamp=1719140258 id_slot=0 id_task=0 p0=0
llama_get_logits_ith: invalid logits id 23, reason: no logits
zsh: segmentation fault
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: llama-server crashes when started with --embeddings #8076

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: llama-server crashes when started with --embeddings #8076

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions