Skip to content

Bug: llama-server crashes when started with --embeddings #8076

Closed
@marcingomulkiewicz

Description

@marcingomulkiewicz

What happened?

System: Mac M1, latest OS (14.5), latest llama.cpp (b3204), build in a std. way (make).
Path to reproduce: start llama server with --embeddings (tested with llama3/8b/fp16 and mistral/7b/Q8_0), go to gui, type anything.
Expected result: system completes/respond.
Actual result: llama-server segfaults: llama_get_logits_ith: invalid logits id 23, reason: no logits / zsh: segmentation fault.
Other notes: The same build/models behave normally when llama-server is started without --embeddings. Similar issue confirmed on Linux/CUDA.

Name and Version

version: 3204 (45c0e2e)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.5.0

What operating system are you seeing the problem on?

Linux, Mac

Relevant log output

[...]
llama_kv_cache_init:      Metal KV buffer size =  4096.00 MiB
llama_new_context_with_model: KV self size  = 4096.00 MiB, K (f16): 2048.00 MiB, V (f16): 2048.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.03 MiB
llama_new_context_with_model:      Metal compute buffer size =  2144.00 MiB
llama_new_context_with_model:        CPU compute buffer size =    72.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 2
INFO [                    init] initializing slots | tid="0x1f31c4c00" timestamp=1719140243 n_slots=1
INFO [                    init] new slot | tid="0x1f31c4c00" timestamp=1719140243 id_slot=0 n_ctx_slot=32768
INFO [                    main] model loaded | tid="0x1f31c4c00" timestamp=1719140243
INFO [                    main] chat template | tid="0x1f31c4c00" timestamp=1719140243 chat_example="[INST] You are a helpful assistant\nHello [/INST]Hi there</s>[INST] How are you? [/INST]" built_in=true
INFO [                    main] HTTP server listening | tid="0x1f31c4c00" timestamp=1719140243 port="8080" n_threads_http="19" hostname="127.0.0.1"
INFO [            update_slots] all slots are idle | tid="0x1f31c4c00" timestamp=1719140243
INFO [      log_server_request] request | tid="0x16b85b000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56208 status=200 method="GET" path="/" params={}
INFO [      log_server_request] request | tid="0x16b85b000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56208 status=200 method="GET" path="/index.js" params={}
INFO [      log_server_request] request | tid="0x16b8e7000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56209 status=200 method="GET" path="/completion.js" params={}
INFO [      log_server_request] request | tid="0x16b973000" timestamp=1719140252 remote_addr="127.0.0.1" remote_port=56211 status=200 method="GET" path="/json-schema-to-grammar.mjs" params={}
INFO [   launch_slot_with_task] slot is processing task | tid="0x1f31c4c00" timestamp=1719140258 id_slot=0 id_task=0
INFO [            update_slots] kv cache rm [p0, end) | tid="0x1f31c4c00" timestamp=1719140258 id_slot=0 id_task=0 p0=0
llama_get_logits_ith: invalid logits id 23, reason: no logits
zsh: segmentation fault

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedhigh severityUsed to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions