Closed
Description
I was using playing around with the server example and wanted to expose the probabilities of the generated tokens to the server client to implement custom stopping sequences and criteria(similar to openai's api here).
All it would take should just be creating a different version of "llama_sample_token" and "llama_sample_token_greedy" that returns an object containing the top X tokens and their probabilities.
The only related issue/pr/discussion I was able to find is this pr about logging probabilities. Please give me pointers if similar requests have been discussed somewhere.
Since I'm relatively new to the repo, what is the protocol here? Should I just make a PR?
Metadata
Metadata
Assignees
Labels
No labels