generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 186
Ollama connector blueprint #4160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
229 changes: 229 additions & 0 deletions
229
docs/remote_inference_blueprints/ollama_connector_chat_blueprint.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,229 @@ | ||
| # Ollama (OpenAI compatible) connector blueprint example for chat | ||
|
|
||
| This is an AI connector blueprint for Ollama or any other local/self-hosted LLM as long as it is OpenAI compatible (Ollama, llama.cpp, vLLM, etc) | ||
|
|
||
| ## 1. Add connector endpoint to trusted URLs | ||
|
|
||
| Adjust the Regex to your local IP. The following example allows all URLs. | ||
|
|
||
| ```json | ||
| PUT /_cluster/settings | ||
| { | ||
| "persistent": { | ||
| "plugins.ml_commons.trusted_connector_endpoints_regex": [ | ||
| ".*$" | ||
| ] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## 2. Enable private addresses | ||
|
|
||
| ```json | ||
| PUT /_cluster/settings | ||
| { | ||
| "persistent": { | ||
| "plugins.ml_commons.connector.private_ip_enabled": true | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## 3. Create the connector | ||
|
|
||
| In a local setting, `openAI_key` might not be needed. In case you can either set it to something irrelevant, or if removed, you need to update the `Authorization` header in the `actions`. | ||
|
|
||
| ```json | ||
| POST /_plugins/_ml/connectors/_create | ||
| { | ||
| "name": "<YOUR CONNECTOR NAME>", | ||
| "description": "<YOUR CONNECTOR DESCRIPTION>", | ||
| "version": "<YOUR CONNECTOR VERSION>", | ||
| "protocol": "http", | ||
| "parameters": { | ||
| "endpoint": "127.0.0.1:11434", | ||
| "model": "qwen3:4b" | ||
| }, | ||
| "credential": { | ||
| "openAI_key": "<YOUR API KEY HERE IF NEEDED>" | ||
| }, | ||
| "actions": [ | ||
| { | ||
| "action_type": "predict", | ||
| "method": "POST", | ||
| "url": "https://${parameters.endpoint}/v1/chat/completions", | ||
| "headers": { | ||
| "Authorization": "Bearer ${credential.openAI_key}" | ||
| }, | ||
| "request_body": "{ \"model\": \"${parameters.model}\", \"messages\": ${parameters.messages} }" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| ### Sample response | ||
|
|
||
| ```json | ||
| { | ||
| "connector_id": "Keq5FpkB72uHgF272LWj" | ||
| } | ||
| ``` | ||
|
|
||
| ## 4. Register the model and deploy the model | ||
|
|
||
| Getting a model to work is a 2-step process. Register and Deploy. | ||
|
|
||
| ### A: Register and Deploy in two steps | ||
|
|
||
| One way to do this is a `_register` call followed by a `_deploy` call. | ||
| First you register: | ||
|
|
||
| ```json | ||
| POST /_plugins/_ml/models/_register | ||
| { | ||
| "name": "Local LLM Model", | ||
| "function_name": "remote", | ||
| "description": "Ollama model", | ||
| "connector_id": "Keq5FpkB72uHgF272LWj" | ||
| } | ||
| ``` | ||
|
|
||
| You get a response like this: | ||
|
|
||
| ```json | ||
| { | ||
| "task_id": "oEdPqZQBQwAL8-GOCJbw", | ||
| "status": "CREATED", | ||
| "model_id": "oUdPqZQBQwAL8-GOCZYL" | ||
| } | ||
| ``` | ||
|
|
||
| Take note of the `model_id`, it is needed for the `_deploy` call. | ||
|
|
||
| Then you deploy: | ||
|
|
||
| Use `model_id` in place of the `<MODEL_ID>` placeholder. | ||
|
|
||
| ```json | ||
| POST /_plugins/_ml/models/<MODEL_ID>/_deploy | ||
| ``` | ||
|
|
||
| #### Sample response | ||
|
|
||
| Once you get a response like this, your model is ready to use. | ||
|
|
||
| ```json | ||
| { | ||
| "task_id": "oEdPqZQBQwAL8-GOCJbw", | ||
| "status": "CREATED", | ||
| "model_id": "oUdPqZQBQwAL8-GOCZYL" | ||
| } | ||
| ``` | ||
|
|
||
| ### B: Register and Deploy in a single step | ||
|
|
||
| Another way is doing the 2 steps at once with `deploy=true`: | ||
|
|
||
| ```json | ||
| POST /_plugins/_ml/models/_register?deploy=true | ||
| { | ||
| "name": "Local LLM Model", | ||
| "function_name": "remote", | ||
| "description": "Ollama model", | ||
| "connector_id": "Keq5FpkB72uHgF272LWj" | ||
| } | ||
| ``` | ||
|
|
||
| #### Sample response | ||
|
|
||
| Once you get a response like this, your model is ready to use. | ||
|
|
||
| ```json | ||
| { | ||
| "task_id": "oEdPqZQBQwAL8-GOCJbw", | ||
| "status": "CREATED", | ||
| "model_id": "oUdPqZQBQwAL8-GOCZYL" | ||
| } | ||
| ``` | ||
|
|
||
| ### 5. Corresponding Predict request example | ||
|
|
||
| Notice how you have to create the whole message structure, not just the message to send. | ||
| Use `model_id` in place of the `<MODEL_ID>` placeholder. | ||
|
|
||
| ```json | ||
| POST /_plugins/_ml/models/<MODEL_ID>/_predict | ||
| { | ||
| "parameters": { | ||
| "messages": [ | ||
| { | ||
| "role": "system", | ||
| "content": "You are a helpful assistant." | ||
| }, | ||
| { | ||
| "role": "user", | ||
| "content": "Why is the sky blue" | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### Sample response | ||
|
|
||
| ```json | ||
| { | ||
| "inference_results": [ | ||
| { | ||
| "output": [ | ||
| { | ||
| "name": "response", | ||
| "dataAsMap": { | ||
| "choices": [ | ||
| { | ||
| "finish_reason": "stop", | ||
| "index": 0, | ||
| "message": { | ||
| "role": "assistant", | ||
| "content": """The sky appears blue due to a phenomenon called Rayleigh scattering. Here's a simple explanation: | ||
|
|
||
| 1. **Sunlight Composition**: Sunlight appears white, but it's actually a mix of all colors of the visible spectrum (red, orange, yellow, green, blue, indigo, violet). | ||
|
|
||
| 2. **Atmospheric Scattering**: When sunlight enters Earth's atmosphere, it interacts with the gas molecules and tiny particles in the air. Shorter wavelengths of light (like blue and violet) are scattered more than other colors because they travel in shorter, smaller waves. | ||
|
|
||
| 3. **Why Blue Dominates**: Although violet light is scattered even more than blue light, the sky appears blue, not violet, because: | ||
| - Our eyes are more sensitive to blue light than violet light. | ||
| - The sun emits more blue light than violet light. | ||
| - Some of the violet light gets absorbed by the upper atmosphere. | ||
|
|
||
| 4. **Time of Day**: The sky appears blue during the day because we're seeing the scattered blue light from all directions. At sunrise or sunset, the light has to pass through more of the atmosphere, scattering the blue light away and leaving mostly red and orange hues. | ||
|
|
||
| This scattering effect is named after Lord Rayleigh, who mathematically described the phenomenon in the 19th century.""" | ||
| } | ||
| } | ||
| ], | ||
| "created": 1757369906, | ||
| "model": "qwen3:4b", | ||
| "system_fingerprint": "b6259-cebb30fb", | ||
| "object": "chat.completion", | ||
| "usage": { | ||
| "completion_tokens": 264, | ||
| "prompt_tokens": 563, | ||
| "total_tokens": 827 | ||
| }, | ||
| "id": "chatcmpl-iHioFpaxa8K2SXgAHd4FhQnbewLQ9PjB", | ||
| "timings": { | ||
| "prompt_n": 563, | ||
| "prompt_ms": 293.518, | ||
| "prompt_per_token_ms": 0.5213463587921847, | ||
| "prompt_per_second": 1918.1106439809487, | ||
| "predicted_n": 264, | ||
| "predicted_ms": 5084.336, | ||
| "predicted_per_token_ms": 19.258848484848485, | ||
| "predicted_per_second": 51.92418439693993 | ||
| } | ||
| } | ||
| } | ||
| ], | ||
| "status_code": 200 | ||
| } | ||
| ``` | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can do register and deploy is true, which save one step, POST /_plugins/_ml/models/_register?deploy=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that originally and then removed because IMO
POST /_plugins/_ml/models/_register? Deploy=truegives a sense ofautomagicthings happening. And with the explicit POST readers have to understand that there is this step (which you can automate) of deploying the connector.After your comment about the tutorial, I think adding all the options in the tutorial to keep the blueprint cleaner would be a good one. But I'm open to exactly the opposite: making the blueprint very clean, even with such automations in place (and a note saying that they are there) and make the tutorial more dense with explanations about the deploying (and others) of models.
What option do you prefer?
PS: Just for reference, my first commit didn't have any deploy because I have auto deploy in my test cluster, that is something I would like to avoid for new readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to have both options, so the blueprint is friendly for starter and also giving a good tip for advanced users who wants to speed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do that.