Skip to content

Commit 128de35

Browse files
ibehnamggerganov
andauthored
server : update readme about token probs (#4777)
* updated server readme to reflect the gg/server-token-probs-4088 commit added explanation for the API's completion result which now includes `completion_probabilities`. Also added a JSON schema that shows the type/structure of `completion_probabilities`. * simplified the `completion_probabilities` JSON schema It's now easier to understand what the structure of `completion_probabilities` looks like. * minor : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 parent 8c58330 commit 128de35

File tree

1 file changed

+34
-25
lines changed

1 file changed

+34
-25
lines changed

examples/server/README.md

Lines changed: 34 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -175,35 +175,44 @@ node index.js
175175

176176
`system_prompt`: Change the system prompt (initial prompt of all slots), this is useful for chat applications. [See more](#change-system-prompt-on-runtime)
177177

178-
*Result JSON:*
178+
### Result JSON:
179179

180-
Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion.
180+
* Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion.
181181

182-
`content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
183182

184-
`stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
183+
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure:
185184

186-
`generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`
187-
188-
`model`: The path to the model loaded with `-m`
189-
190-
`prompt`: The provided `prompt`
191-
192-
`stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token
193-
194-
`stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered
195-
196-
`stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided
197-
198-
`stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word)
199-
200-
`timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second`
201-
202-
`tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`)
203-
204-
`tokens_evaluated`: Number of tokens evaluated in total from the prompt
205-
206-
`truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`)
185+
```
186+
{
187+
"content": "<the token selected by the model>",
188+
"probs": [
189+
{
190+
"prob": float,
191+
"tok_str": "<most likely token>"
192+
},
193+
{
194+
"prob": float,
195+
"tok_str": "<second most likely tonen>"
196+
},
197+
...
198+
]
199+
},
200+
```
201+
Notice that each `probs` is an array of length `n_probs`.
202+
203+
- `content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
204+
- `stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
205+
- `generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`
206+
- `model`: The path to the model loaded with `-m`
207+
- `prompt`: The provided `prompt`
208+
- `stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token
209+
- `stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered
210+
- `stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided
211+
- `stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word)
212+
- `timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second`
213+
- `tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`)
214+
- `tokens_evaluated`: Number of tokens evaluated in total from the prompt
215+
- `truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`)
207216

208217
- **POST** `/tokenize`: Tokenize a given text.
209218

0 commit comments

Comments
 (0)