Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama: Specify keep_alive via settings #17906

Merged
merged 2 commits into from
Sep 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions crates/language_model/src/provider/ollama.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use gpui::{AnyView, AppContext, AsyncAppContext, ModelContext, Subscription, Tas
use http_client::HttpClient;
use ollama::{
get_models, preload_model, stream_chat_completion, ChatMessage, ChatOptions, ChatRequest,
ChatResponseDelta, OllamaToolCall,
ChatResponseDelta, KeepAlive, OllamaToolCall,
};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
Expand Down Expand Up @@ -42,6 +42,8 @@ pub struct AvailableModel {
pub display_name: Option<String>,
/// The Context Length parameter to the model (aka num_ctx or n_ctx)
pub max_tokens: usize,
/// The number of seconds to keep the connection open after the last request
pub keep_alive: Option<KeepAlive>,
}

pub struct OllamaLanguageModelProvider {
Expand Down Expand Up @@ -156,7 +158,7 @@ impl LanguageModelProvider for OllamaLanguageModelProvider {
name: model.name.clone(),
display_name: model.display_name.clone(),
max_tokens: model.max_tokens,
keep_alive: None,
keep_alive: model.keep_alive.clone(),
},
);
}
Expand Down
2 changes: 2 additions & 0 deletions docs/src/assistant/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,8 @@ Depending on your hardware or use-case you may wish to limit or increase the con

If you specify a context length that is too large for your hardware, Ollama will log an error. You can watch these logs by running: `tail -f ~/.ollama/logs/ollama.log` (MacOS) or `journalctl -u ollama -f` (Linux). Depending on the memory available on your machine, you may need to adjust the context length to a smaller value.

You may also optionally specify a value for `keep_alive` for each available model. This can be an integer (seconds) or alternately a string duration like "5m", "10m", "1h", "1d", etc., For example `"keep_alive": "120s"` will allow the remote server to unload the model (freeing up GPU VRAM) after 120seconds.

### OpenAI {#openai}

1. Visit the OpenAI platform and [create an API key](https://platform.openai.com/account/api-keys)
Expand Down
Loading