You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When starting a conversation with a model served via Ollama, it sometimes happens the prompt stops prematurely with an ETIMEDOUT error. After resending the prompt, it is answered correctly, but only because the model has been loaded by then. With larger models (like Command R Plus), it takes several retries.
Steps to Reproduce
Start a chat with an Ollama-served model
Prompt the model
Send the prompt
See the general error message appear
What browsers are you seeing the problem on?
Safari
Relevant log output
Librechat error-log:
{"cause":{"code":"ETIMEDOUT","errno":"ETIMEDOUT","message":"request to http://host.docker.internal:11434/v1/chat/completions failed, reason: read ETIMEDOUT","type":"system"},"level":"error","message":"[handleAbortError] AI response error; aborting request: Connection error.","stack":"Error: Connection error.\n at OpenAI.makeRequest (/app/api/node_modules/openai/core.js:292:19)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async ChatCompletionStream._createChatCompletion (/app/api/node_modules/openai/lib/ChatCompletionStream.js:53:24)\n at async ChatCompletionStream._runChatCompletion (/app/api/node_modules/openai/lib/AbstractChatCompletionRunner.js:314:16)"}
Librechat debug-log:
2024-07-12T05:36:03.543Z debug: [OpenAIClient] chatCompletion
{
baseURL: "http://host.docker.internal:11434/v1",
modelOptions.model: "gemma-2-27b-it-Q8_0_L.gguf:latest",
modelOptions.temperature: 0.7,
modelOptions.top_p: 1,
modelOptions.presence_penalty: 0,
modelOptions.frequency_penalty: 0,
modelOptions.stop: undefined,
modelOptions.max_tokens: undefined,
modelOptions.user: "668d8e6941ec54b9987a6bcc",
modelOptions.stream: true,
// 2 message(s)
modelOptions.messages: [{"role":"system","name":"instructions","content":"Instructions:\nAntwoord uitsluitend in het Nederla... [truncated],{"role":"user","content":"Dit is een test."}],}2024-07-12T05:36:03.548Z debug: Making request to http://host.docker.internal:11434/v1/chat/completions2024-07-12T05:36:15.288Z debug: Making request to http://host.docker.internal:11434/v1/chat/completions2024-07-12T05:36:27.455Z debug: Making request to http://host.docker.internal:11434/v1/chat/completions2024-07-12T05:36:38.743Z warn: [OpenAIClient.chatCompletion][stream] API error2024-07-12T05:36:38.743Z error: [handleAbortError] AI response error; aborting request: Connection error.2024-07-12T05:36:38.747Z debug: [AskController] Request closed2024-07-12T06:27:35.786Z debug: [AskController]Ollama-server log:level=WARN source=server.go:570 msg="client connection closed before server finished loading, aborting load"level=ERROR source=sched.go:480 msg="error loading llama server" error="timed out waiting for llama runner to start: context canceled"
Screenshots
No response
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
I tested a bit more. Even when the model is loaded, but prompted with a large prompt, a timeout occurs. I think that the timeout should be removed when using Ollama, and only present an error when Ollama does.
@danny-avila Did you have a chance to look into this issue? Would be great if timeouts for custom endpoints can be changed and/or disabled completely. Thanks!
What happened?
When starting a conversation with a model served via Ollama, it sometimes happens the prompt stops prematurely with an ETIMEDOUT error. After resending the prompt, it is answered correctly, but only because the model has been loaded by then. With larger models (like Command R Plus), it takes several retries.
Steps to Reproduce
What browsers are you seeing the problem on?
Safari
Relevant log output
Screenshots
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: