Skip to content

updated server readme to reflect the gg/server-token-probs-4088 commit #4777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 9, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 34 additions & 25 deletions examples/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,35 +174,44 @@ node index.js

`system_prompt`: Change the system prompt (initial prompt of all slots), this is useful for chat applications. [See more](#change-system-prompt-on-runtime)

*Result JSON:*
### Result JSON:

Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion.
* Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion.

`content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.

`stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure:

`generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`

`model`: The path to the model loaded with `-m`

`prompt`: The provided `prompt`

`stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token

`stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered

`stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided

`stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word)

`timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second`

`tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`)

`tokens_evaluated`: Number of tokens evaluated in total from the prompt

`truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`)
```
{
"content": "<the token selected by the model>",
"probs": [
{
"prob": float,
"tok_str": "<most likely token>"
},
{
"prob": float,
"tok_str": "<second most likely tonen>"
},
...
]
},
```
Notice that each `probs` is an array of length `n_probs`.

- `content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
- `stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
- `generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`
- `model`: The path to the model loaded with `-m`
- `prompt`: The provided `prompt`
- `stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token
- `stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered
- `stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided
- `stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word)
- `timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second`
- `tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`)
- `tokens_evaluated`: Number of tokens evaluated in total from the prompt
- `truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`)

- **POST** `/tokenize`: Tokenize a given text.

Expand Down