You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/api.md
+10-142
Original file line number
Diff line number
Diff line change
@@ -24,31 +24,30 @@ All durations are returned in nanoseconds.
24
24
25
25
### Streaming responses
26
26
27
-
Certain endpoints stream responses as JSON objects.
27
+
Certain endpoints stream responses as JSON objects delineated with the newline (`\n`) character.
28
28
29
29
## Generate a completion
30
30
31
31
```shell
32
32
POST /api/generate
33
33
```
34
34
35
-
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
35
+
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
36
36
37
37
### Parameters
38
38
39
-
`model` is required.
40
-
41
39
-`model`: (required) the [model name](#model-names)
42
40
-`prompt`: the prompt to generate a response for
43
41
44
42
Advanced parameters (optional):
45
43
46
44
-`format`: the format to return a response in. Currently the only accepted value is `json`
47
45
-`options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
48
-
-`template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
49
46
-`system`: system prompt to (overrides what is defined in the `Modelfile`)
47
+
-`template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
48
+
-`context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
50
49
-`stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
51
-
-`raw`: if `true` no formatting will be applied to the prompt. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API.
50
+
-`raw`: if `true` no formatting will be applied to the prompt and no context will be returned. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API, and are managing history yourself.
52
51
53
52
### JSON mode
54
53
@@ -58,7 +57,7 @@ Enable JSON mode by setting the `format` parameter to `json`. This will structur
58
57
59
58
### Examples
60
59
61
-
#### Request (Prompt)
60
+
#### Request
62
61
63
62
```shell
64
63
curl http://localhost:11434/api/generate -d '{
@@ -90,7 +89,7 @@ The final response in the stream also includes additional data about the generat
90
89
-`prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
91
90
-`eval_count`: number of tokens the response
92
91
-`eval_duration`: time in nanoseconds spent generating the response
93
-
-`context`: deprecated, an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
92
+
-`context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
94
93
-`response`: empty if the response was streamed, if not streamed, this will contain the full response
95
94
96
95
To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.
@@ -115,8 +114,6 @@ To calculate how fast the response is generated in tokens per second (token/s),
115
114
116
115
#### Request (No streaming)
117
116
118
-
A response can be recieved in one reply when streaming is off.
119
-
120
117
```shell
121
118
curl http://localhost:11434/api/generate -d '{
122
119
"model": "llama2",
@@ -147,9 +144,9 @@ If `stream` is set to `false`, the response will be a single JSON object:
147
144
}
148
145
```
149
146
150
-
#### Request (Raw Mode)
147
+
#### Request (Raw mode)
151
148
152
-
In some cases you may wish to bypass the templating system and provide a full prompt. In this case, you can use the `raw` parameter to disable formatting.
149
+
In some cases you may wish to bypass the templating system and provide a full prompt. In this case, you can use the `raw` parameter to disable formatting and context.
Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
300
-
301
-
### Parameters
302
-
303
-
`model` is required.
304
-
305
-
-`model`: (required) the [model name](#model-names)
306
-
-`messages`: the messages of the chat, this can be used to keep a chat memory
307
-
308
-
Advanced parameters (optional):
309
-
310
-
-`format`: the format to return a response in. Currently the only accepted value is `json`
311
-
-`options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
312
-
-`template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
313
-
-`stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
0 commit comments