You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* updated server readme to reflect the gg/server-token-probs-4088 commit
added explanation for the API's completion result which now includes `completion_probabilities`. Also added a JSON schema that shows the type/structure of `completion_probabilities`.
* simplified the `completion_probabilities` JSON schema
It's now easier to understand what the structure of `completion_probabilities` looks like.
* minor : fix trailing whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Copy file name to clipboardExpand all lines: examples/server/README.md
+34-25Lines changed: 34 additions & 25 deletions
Original file line number
Diff line number
Diff line change
@@ -175,35 +175,44 @@ node index.js
175
175
176
176
`system_prompt`: Change the system prompt (initial prompt of all slots), this is useful for chat applications. [See more](#change-system-prompt-on-runtime)
177
177
178
-
*Result JSON:*
178
+
### Result JSON:
179
179
180
-
Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion.
180
+
* Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion.
181
181
182
-
`content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
183
182
184
-
`stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
183
+
-`completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure:
185
184
186
-
`generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`
187
-
188
-
`model`: The path to the model loaded with `-m`
189
-
190
-
`prompt`: The provided `prompt`
191
-
192
-
`stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token
193
-
194
-
`stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered
195
-
196
-
`stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided
197
-
198
-
`stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word)
199
-
200
-
`timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second`
201
-
202
-
`tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`)
203
-
204
-
`tokens_evaluated`: Number of tokens evaluated in total from the prompt
205
-
206
-
`truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`)
185
+
```
186
+
{
187
+
"content": "<the token selected by the model>",
188
+
"probs": [
189
+
{
190
+
"prob": float,
191
+
"tok_str": "<most likely token>"
192
+
},
193
+
{
194
+
"prob": float,
195
+
"tok_str": "<second most likely tonen>"
196
+
},
197
+
...
198
+
]
199
+
},
200
+
```
201
+
Notice that each `probs` is an array of length `n_probs`.
202
+
203
+
-`content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
204
+
-`stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
205
+
-`generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`
206
+
-`model`: The path to the model loaded with `-m`
207
+
-`prompt`: The provided `prompt`
208
+
-`stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token
209
+
-`stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered
210
+
-`stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided
211
+
-`stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word)
212
+
-`timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second`
213
+
-`tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`)
214
+
-`tokens_evaluated`: Number of tokens evaluated in total from the prompt
215
+
-`truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`)
0 commit comments