A sampling function that returns top token probabilities

I was using playing around with the [server example](https://github.com/ggerganov/llama.cpp/blob/98ed16557432d7a5179c57eddcc3a08a7ae6d54d/examples/server/server.cpp) and wanted to expose the probabilities of the generated tokens to the server client to implement custom stopping sequences and criteria(similar to openai's api [here](https://platform.openai.com/docs/api-reference/completions/create#completions/create-logprobs)). 

All it would take should just be creating a different version of "[llama_sample_token](https://github.com/ggerganov/llama.cpp/blob/98ed16557432d7a5179c57eddcc3a08a7ae6d54d/llama.cpp#L2208)" and "[llama_sample_token_greedy](https://github.com/ggerganov/llama.cpp/blob/98ed16557432d7a5179c57eddcc3a08a7ae6d54d/llama.cpp#L2192)" that returns an object containing the top X tokens and their probabilities. 

The only related issue/pr/discussion I was able to find is [this pr](https://github.com/ggerganov/llama.cpp/pull/246#issue-1629958364) about logging probabilities. Please give me pointers if similar requests have been discussed somewhere.

Since I'm relatively new to the repo, what is the protocol here? Should I just make a PR?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A sampling function that returns top token probabilities #1784

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A sampling function that returns top token probabilities #1784

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions