Doesn't run llama-cpp, even though it's in the PATH

**LocalAI version:**

```sh
local-ai --help | grep Version
Version: v2.22.0 (a1634b219a4e52813e70ff07e6376a01449c4515)
```

**Environment, CPU architecture, OS, and Version:**

```sh
uname -srm
Darwin 23.6.0 arm64
```

**Describe the bug**

It's failing to run the model I installed from the default local-ai registry, reporting that it failed to use the `llama-cpp` backend, even though it's installed and in the `PATH`.

**To Reproduce**

```sh
curl -L -O https://github.com/mudler/LocalAI/releases/download/v2.22.0/local-ai-Darwin-arm64
mv local-ai-Darwin-arm64 ~/bin/local-ai
xattr -r -d com.apple.quarantine ~/bin/local-ai

mkdir -p ~/.local/opt/
curl -L -O https://github.com/ggerganov/llama.cpp/releases/download/b3933/llama-b3933-bin-macos-arm64.zip
# note: yes, BSD tar will unzip .zip files
tar xvf llama-b3933-bin-macos-arm64.zip
mv ./build ~/.local/opt/llama-ccp
xattr -r -d com.apple.quarantine ~/.local/opt/llama-cpp

export PATH="$HOME/bin:$HOME/.local/opt/llama-ccp/bin:$PATH"
llama-cli --version
# version: 3933 (f010b77a)
local-ai --help | grep Version
# Version: v2.22.0 (a1634b219a4e52813e70ff07e6376a01449c4515)

local-ai models list
local-ai models install localai@deepseek-coder-v2-lite-instruct
local-ai run
```

```sh
curl http://localhost:8080/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
              "model": "deepseek-coder-v2-lite-instruct",
              "prompt": "A long time ago in a galaxy far, far away",
              "temperature": 0.7
        }'
```

**Expected behavior**

The standalone binary should ship with the things it needs to load models in its own library, and documentation should specify if there's any special configuration needed beyond that.

**Logs**

```text
2:03AM DBG Setting logging to debug
2:03AM INF Starting LocalAI using 10 threads, with models path: /Users/aj/models
2:03AM INF LocalAI version: v2.22.0 (a1634b219a4e52813e70ff07e6376a01449c4515)
2:03AM DBG guessDefaultsFromFile: template already set name=deepseek-coder-v2-lite-instruct
2:03AM INF Preloading models from /Users/aj/models

  Model name: deepseek-coder-v2-lite-instruct


2:03AM DBG Model: deepseek-coder-v2-lite-instruct (config: {PredictionOptions:{Model:DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf Language: Translate:false N:0 TopP:0x14000991170 TopK:0x14000991178 Temperature:0x14000991180 Maxtokens:0x140009911b0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0x140009911a8 TypicalP:0x140009911a0 Seed:0x140009911c8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:deepseek-coder-v2-lite-instruct F16:0x14000991168 Threads:0x14000991160 Debug:0x140009911c0 Roles:map[] Embeddings:0x140009911c1 Backend: TemplateConfig:{Chat:{{.Input -}}
Assistant: # Space is preserved for templating reasons, but line does not end with one for the linter.
 ChatMessage:{{if eq .RoleName "user" -}}User: {{.Content }}
{{ end -}}
{{if eq .RoleName "assistant" -}}Assistant: {{.Content}}<｜end▁of▁sentence｜>{{end}}
{{if eq .RoleName "system" -}}{{.Content}}
{{end -}} Completion:{{.Input}}
 Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Video: Image: Audio:} KnownUsecaseStrings:[] KnownUsecases:<nil> PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0x14000991198 MirostatTAU:0x14000991190 Mirostat:0x14000991188 NGPULayers:0x140009911b8 MMap:0x14000990ff8 MMlock:0x140009911c1 LowVRAM:0x140009911c1 Grammar: StopWords:[<｜end▁of▁sentence｜>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0x14000990fe8 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: VallE:{AudioPath:}} CUDA:false DownloadFiles:[] Description: Usage:})
2:03AM DBG Extracting backend assets files to /tmp/localai/backend_data
2:03AM DBG processing api keys runtime update
2:03AM DBG processing external_backends.json
2:03AM DBG external backends loaded from external_backends.json
2:03AM INF core/startup process completed!
2:03AM DBG No configuration file found at /tmp/localai/upload/uploadedFiles.json
2:03AM DBG No configuration file found at /tmp/localai/config/assistants.json
2:03AM DBG No configuration file found at /tmp/localai/config/assistantsFile.json
2:03AM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
2:03AM DBG Request received: {"model":"deepseek-coder-v2-lite-instruct","language":"","translate":false,"n":0,"top_p":null,"top_k":null,"temperature":0.7,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"repeat_last_n":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","size":"","prompt":"A long time ago in a galaxy far, far away","instruction":"","input":null,"stop":null,"messages":null,"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""}
2:03AM DBG `input`: &{PredictionOptions:{Model:deepseek-coder-v2-lite-instruct Language: Translate:false N:0 TopP:<nil> TopK:<nil> Temperature:0x14000991dd0 Maxtokens:<nil> Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:<nil> TypicalP:<nil> Seed:<nil> NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Context:context.Background.WithCancel.WithValue(openai.correlationIDKeyType, 5d53ea3a-bb00-4f32-b8af-aa8ecc5ca269) Cancel:0x1047f90a0 File: ResponseFormat:<nil> Size: Prompt:A long time ago in a galaxy far, far away Instruction: Input:<nil> Stop:<nil> Messages:[] Functions:[] FunctionCall:<nil> Tools:[] ToolsChoice:<nil> Stream:false Mode:0 Step:0 Grammar: JSONFunctionGrammarObject:<nil> Backend: ModelBaseName:}
2:03AM DBG guessDefaultsFromFile: template already set name=deepseek-coder-v2-lite-instruct
2:03AM DBG Parameter Config: &{PredictionOptions:{Model:DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf Language: Translate:false N:0 TopP:0x14000991170 TopK:0x14000991178 Temperature:0x14000991dd0 Maxtokens:0x140009911b0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0x140009911a8 TypicalP:0x140009911a0 Seed:0x140009911c8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:deepseek-coder-v2-lite-instruct F16:0x14000991168 Threads:0x14000991160 Debug:0x14000600150 Roles:map[] Embeddings:0x140009911c1 Backend: TemplateConfig:{Chat:{{.Input -}}
Assistant: # Space is preserved for templating reasons, but line does not end with one for the linter.
 ChatMessage:{{if eq .RoleName "user" -}}User: {{.Content }}
{{ end -}}
{{if eq .RoleName "assistant" -}}Assistant: {{.Content}}<｜end▁of▁sentence｜>{{end}}
{{if eq .RoleName "system" -}}{{.Content}}
{{end -}} Completion:{{.Input}}
 Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Video: Image: Audio:} KnownUsecaseStrings:[] KnownUsecases:<nil> PromptStrings:[A long time ago in a galaxy far, far away] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0x14000991198 MirostatTAU:0x14000991190 Mirostat:0x14000991188 NGPULayers:0x140009911b8 MMap:0x14000990ff8 MMlock:0x140009911c1 LowVRAM:0x140009911c1 Grammar: StopWords:[<｜end▁of▁sentence｜>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0x14000990fe8 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: VallE:{AudioPath:}} CUDA:false DownloadFiles:[] Description: Usage:}
2:03AM DBG Template found, input modified to: A long time ago in a galaxy far, far away

2:03AM DBG Loading from the following backends (in order): [llama-cpp llama-ggml llama-cpp-fallback rwkv default.metallib piper whisper huggingface bert-embeddings]
2:03AM INF Trying to load the model 'deepseek-coder-v2-lite-instruct' with the backend '[llama-cpp llama-ggml llama-cpp-fallback rwkv default.metallib piper whisper huggingface bert-embeddings]'
2:03AM INF [llama-cpp] Attempting to load
2:03AM INF Loading model 'deepseek-coder-v2-lite-instruct' with backend llama-cpp
2:03AM DBG Loading model in memory from file: /Users/aj/models/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf
2:03AM DBG Loading Model deepseek-coder-v2-lite-instruct with gRPC (file: /Users/aj/models/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf) (backend: llama-cpp): {backendString:llama-cpp model:DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf modelID:deepseek-coder-v2-lite-instruct assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0x14000464908 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
2:03AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
2:03AM DBG GRPC Service for deepseek-coder-v2-lite-instruct will be running at: '127.0.0.1:55725'
2:03AM DBG GRPC Service state dir: /var/folders/7d/97myp70d0qg1lrlykkf7t31r0000gn/T/go-processmanager621157397
2:03AM DBG GRPC Service Started
2:03AM DBG Wait for the service to start up
2:03AM DBG GRPC(deepseek-coder-v2-lite-instruct-127.0.0.1:55725): stderr dyld[51875]: Library not loaded: @rpath/libutf8_validity.dylib
2:03AM DBG GRPC(deepseek-coder-v2-lite-instruct-127.0.0.1:55725): stderr   Referenced from: <C90FE9D1-C086-3408-8E69-9761FE960491> /private/tmp/localai/backend_data/backend-assets/lib/libprotobuf.28.2.0.dylib
2:03AM DBG GRPC(deepseek-coder-v2-lite-instruct-127.0.0.1:55725): stderr   Reason: tried: '/private/tmp/localai/backend_data/backend-assets/lib/libutf8_validity.dylib' (no such file), '/private/tmp/localai/backend_data/backend-assets/lib/../lib/libutf8_validity.dylib' (no such file), '/private/tmp/localai/backend_data/backend-assets/lib/../lib/libutf8_validity.dylib' (no such file), '/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/tmp/localai/backend_data/backend-assets/lib/libutf8_validity.dylib' (no such file)
2:04AM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:55725: connect: connection refused\""
2:04AM DBG GRPC Service NOT ready
2:04AM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback'
2:04AM DBG Loading model in memory from file: /Users/aj/models/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf
2:04AM DBG Loading Model deepseek-coder-v2-lite-instruct with gRPC (file: /Users/aj/models/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf) (backend: llama-cpp-fallback): {backendString:llama-cpp model:DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf modelID:deepseek-coder-v2-lite-instruct assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0x14000464908 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
2:04AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
2:04AM DBG GRPC Service for deepseek-coder-v2-lite-instruct will be running at: '127.0.0.1:55746'
2:04AM DBG GRPC Service state dir: /var/folders/7d/97myp70d0qg1lrlykkf7t31r0000gn/T/go-processmanager3345443428
2:04AM DBG GRPC Service Started
2:04AM DBG Wait for the service to start up
2:04AM DBG GRPC(deepseek-coder-v2-lite-instruct-127.0.0.1:55746): stderr dyld[51886]: Library not loaded: @rpath/libutf8_validity.dylib
2:04AM DBG GRPC(deepseek-coder-v2-lite-instruct-127.0.0.1:55746): stderr   Referenced from: <C90FE9D1-C086-3408-8E69-9761FE960491> /private/tmp/localai/backend_data/backend-assets/lib/libprotobuf.28.2.0.dylib
2:04AM DBG GRPC(deepseek-coder-v2-lite-instruct-127.0.0.1:55746): stderr   Reason: tried: '/private/tmp/localai/backend_data/backend-assets/lib/libutf8_validity.dylib' (no such file), '/private/tmp/localai/backend_data/backend-assets/lib/../lib/libutf8_validity.dylib' (no such file), '/private/tmp/localai/backend_data/backend-assets/lib/../lib/libutf8_validity.dylib' (no such file), '/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libutf8_validity.dylib' (no such file), '/tmp/localai/backend_data/backend-assets/lib/libutf8_validity.dylib' (no such file)
```

**Additional context**

In the debug logs it looks like some files are missing from the release file, and some paths are hard-coded to specific locations on the computer that compiled the release, rather than being relative to the installed directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Doesn't run llama-cpp, even though it's in the PATH #3858

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Doesn't run llama-cpp, even though it's in the PATH #3858

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions