Skip to content

LocalAI gets blocked at Model already loaded in memory: after hours of successful inferencing #1017

@robcaulk

Description

@robcaulk

LocalAI version:

image: quay.io/go-skynet/local-ai:master-cublas-cuda11

Environment, CPU architecture, OS, and Version:

Kubernetes deployment with above image. Underneath it is an AMD EPYC Milan CPU (16 core) + A4500 GPU

Describe the bug

I start the server, then begin sending one request at a time to it. After 100s of successful inferences. One blocks at the message below. Then the GPU sits hovering above 94% util and full power indefinitely. All subsequent requests timeout on the openai side. Requests are all exactly the same size and type. I see it gets stuck on some prompts that are extremely short and simple without any special chars.

It is hard to predict "when" this will happen. But it seems that it always will eventually. Sometimes it can happen within an hour, other times it takes 12 hours.

I am limiting the number of in-flight requests to 1 using my client.

8:02AM DBG Loading model llama-stable from wizardlm-13b-v1.2.ggmlv3.q5_K_M.bin
8:02AM DBG Stopping all backends except 'wizardlm-13b-v1.2.ggmlv3.q5_K_M.bin'
8:02AM DBG Model already loaded in memory: wizardlm-13b-v1.2.ggmlv3.q5_K_M.bin

To Reproduce

Here is my model YAML:

backend: llama-stable
context_size: 4096
batch: 512
threads: 1
f16: true
gpu_layers: 43
mmlock: true
name: wizard13B_gpu
parameters:
  model: wizardlm-13b-v1.2.ggmlv3.q5_K_M.bin
  temperature: 0.2
  top_p: 0.9
roles:
  assistant: 'ASSISTANT:'
  system: 'SYSTEM:'
  user: 'USER:'
stopwords:
  - "USER:"
  - "</s>"
template:
  chat: wizard_chat
  completion: wizard_completion

Expected behavior

I would expect that it continues operating as it does for the first few hundred requests.

Logs

8:02AM DBG Loading model llama-stable from wizardlm-13b-v1.2.ggmlv3.q5_K_M.bin
8:02AM DBG Stopping all backends except 'wizardlm-13b-v1.2.ggmlv3.q5_K_M.bin'
8:02AM DBG Model already loaded in memory: wizardlm-13b-v1.2.ggmlv3.q5_K_M.bin

Additional context

Metadata

Metadata

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions