Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Librechat doesn't wait until Ollama has loaded model #3330

Open
1 task done
vlbosch opened this issue Jul 12, 2024 · 2 comments
Open
1 task done

[Bug]: Librechat doesn't wait until Ollama has loaded model #3330

vlbosch opened this issue Jul 12, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@vlbosch
Copy link

vlbosch commented Jul 12, 2024

What happened?

When starting a conversation with a model served via Ollama, it sometimes happens the prompt stops prematurely with an ETIMEDOUT error. After resending the prompt, it is answered correctly, but only because the model has been loaded by then. With larger models (like Command R Plus), it takes several retries.

Steps to Reproduce

  1. Start a chat with an Ollama-served model
  2. Prompt the model
  3. Send the prompt
  4. See the general error message appear

What browsers are you seeing the problem on?

Safari

Relevant log output

Librechat error-log: 
{"cause":{"code":"ETIMEDOUT","errno":"ETIMEDOUT","message":"request to http://host.docker.internal:11434/v1/chat/completions failed, reason: read ETIMEDOUT","type":"system"},"level":"error","message":"[handleAbortError] AI response error; aborting request: Connection error.","stack":"Error: Connection error.\n    at OpenAI.makeRequest (/app/api/node_modules/openai/core.js:292:19)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async ChatCompletionStream._createChatCompletion (/app/api/node_modules/openai/lib/ChatCompletionStream.js:53:24)\n    at async ChatCompletionStream._runChatCompletion (/app/api/node_modules/openai/lib/AbstractChatCompletionRunner.js:314:16)"}

Librechat debug-log: 
2024-07-12T05:36:03.543Z debug: [OpenAIClient] chatCompletion
{
  baseURL: "http://host.docker.internal:11434/v1",
    modelOptions.model: "gemma-2-27b-it-Q8_0_L.gguf:latest",
    modelOptions.temperature: 0.7,
    modelOptions.top_p: 1,
    modelOptions.presence_penalty: 0,
    modelOptions.frequency_penalty: 0,
    modelOptions.stop: undefined,
    modelOptions.max_tokens: undefined,
    modelOptions.user: "668d8e6941ec54b9987a6bcc",
    modelOptions.stream: true,
    // 2 message(s)
    modelOptions.messages: [{"role":"system","name":"instructions","content":"Instructions:\nAntwoord uitsluitend in het Nederla... [truncated],{"role":"user","content":"Dit is een test."}],
}
2024-07-12T05:36:03.548Z debug: Making request to http://host.docker.internal:11434/v1/chat/completions
2024-07-12T05:36:15.288Z debug: Making request to http://host.docker.internal:11434/v1/chat/completions
2024-07-12T05:36:27.455Z debug: Making request to http://host.docker.internal:11434/v1/chat/completions
2024-07-12T05:36:38.743Z warn: [OpenAIClient.chatCompletion][stream] API error
2024-07-12T05:36:38.743Z error: [handleAbortError] AI response error; aborting request: Connection error.
2024-07-12T05:36:38.747Z debug: [AskController] Request closed
2024-07-12T06:27:35.786Z debug: [AskController]

Ollama-server log:
level=WARN source=server.go:570 msg="client connection closed before server finished loading, aborting load"
level=ERROR source=sched.go:480 msg="error loading llama server" error="timed out waiting for llama runner to start: context canceled"

Screenshots

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@vlbosch vlbosch added the bug Something isn't working label Jul 12, 2024
@vlbosch
Copy link
Author

vlbosch commented Jul 13, 2024

I tested a bit more. Even when the model is loaded, but prompted with a large prompt, a timeout occurs. I think that the timeout should be removed when using Ollama, and only present an error when Ollama does.

@vlbosch
Copy link
Author

vlbosch commented Aug 21, 2024

@danny-avila Did you have a chance to look into this issue? Would be great if timeouts for custom endpoints can be changed and/or disabled completely. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant