Closed
Description
What happened?
System: Mac M1, latest OS (14.5), latest llama.cpp (b3204), build in a std. way (make).
Path to reproduce: start llama server with --embeddings (tested with llama3/8b/fp16 and mistral/7b/Q8_0), go to gui, type anything.
Expected result: system completes/respond.
Actual result: llama-server segfaults: llama_get_logits_ith: invalid logits id 23, reason: no logits / zsh: segmentation fault.
Other notes: The same build/models behave normally when llama-server is started without --embeddings. Similar issue confirmed on Linux/CUDA.
Name and Version
version: 3204 (45c0e2e)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.5.0
What operating system are you seeing the problem on?
Linux, Mac
Relevant log output
[...]
llama_kv_cache_init: Metal KV buffer size = 4096.00 MiB
llama_new_context_with_model: KV self size = 4096.00 MiB, K (f16): 2048.00 MiB, V (f16): 2048.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.03 MiB
llama_new_context_with_model: Metal compute buffer size = 2144.00 MiB
llama_new_context_with_model: CPU compute buffer size = 72.01 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 2
INFO [ init] initializing slots | tid="0x1f31c4c00" timestamp=1719140243 n_slots=1
INFO [ init] new slot | tid="0x1f31c4c00" timestamp=1719140243 id_slot=0 n_ctx_slot=32768
INFO [ main] model loaded | tid="0x1f31c4c00" timestamp=1719140243
INFO [ main] chat template | tid="0x1f31c4c00" timestamp=1719140243 chat_example="[INST] You are a helpful assistant\nHello [/INST]Hi there</s>[INST] How are you? [/INST]" built_in=true
INFO [ main] HTTP server listening | tid="0x1f31c4c00" timestamp=1719140243 port="8080" n_threads_http="19" hostname="127.0.0.1"
INFO [ update_slots] all slots are idle | tid="0x1f31c4c00" timestamp=1719140243
INFO [ log_server_request] request | tid="0x16b85b000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56208 status=200 method="GET" path="/" params={}
INFO [ log_server_request] request | tid="0x16b85b000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56208 status=200 method="GET" path="/index.js" params={}
INFO [ log_server_request] request | tid="0x16b8e7000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56209 status=200 method="GET" path="/completion.js" params={}
INFO [ log_server_request] request | tid="0x16b973000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56211 status=200 method="GET" path="/json-schema-to-grammar.mjs" params={}
INFO [ launch_slot_with_task] slot is processing task | tid="0x1f31c4c00" timestamp=1719140258 id_slot=0 id_task=0
INFO [ update_slots] kv cache rm [p0, end) | tid="0x1f31c4c00" timestamp=1719140258 id_slot=0 id_task=0 p0=0
llama_get_logits_ith: invalid logits id 23, reason: no logits
zsh: segmentation fault