Skip to content

Commit b52edd2

Browse files
ggerganovngxson
andauthored
server : remove n_past (#16818)
* server : remove n_past * server : replace slot.n_prompt_tokens() with slot.task->n_tokens() * server : fixes + clean-up * cont : fix context shift * server : add server_tokens::pos_next() Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * server : fix pos_next() usage Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
1 parent 517b717 commit b52edd2

File tree

3 files changed

+177
-153
lines changed

3 files changed

+177
-153
lines changed

tools/server/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -587,7 +587,7 @@ These words will not be included in the completion, so make sure to add them to
587587
- `word`: Stopped due to encountering a stopping word from `stop` JSON array provided
588588
- `stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word)
589589
- `timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second`
590-
- `tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`)
590+
- `tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion
591591
- `tokens_evaluated`: Number of tokens evaluated in total from the prompt
592592
- `truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`)
593593

@@ -1045,7 +1045,7 @@ Available metrics:
10451045
- `llamacpp:kv_cache_tokens`: KV-cache tokens.
10461046
- `llamacpp:requests_processing`: Number of requests processing.
10471047
- `llamacpp:requests_deferred`: Number of requests deferred.
1048-
- `llamacpp:n_past_max`: High watermark of the context size observed.
1048+
- `llamacpp:n_tokens_max`: High watermark of the context size observed.
10491049

10501050
### POST `/slots/{id_slot}?action=save`: Save the prompt cache of the specified slot to a file.
10511051

0 commit comments

Comments
 (0)