Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: add option to output probabilities for completion #1962

Merged
merged 27 commits into from
Jul 2, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
ba210e4
server: add option to output probabilities for completion
WangHaoranRobin Jun 21, 2023
8004e67
Merge pull request #1 from WangHaoranRobin/robin_fork_master
WangHaoranRobin Jun 21, 2023
ccf254b
server: fix comment about max n_probs
WangHaoranRobin Jun 22, 2023
926664c
Merge pull request #2 from WangHaoranRobin/robin_fork_master
WangHaoranRobin Jun 22, 2023
cf76195
server: fix issue when handling probability output for incomplete tok…
WangHaoranRobin Jun 23, 2023
bdb710e
Merge pull request #3 from WangHaoranRobin/robin_fork_master
WangHaoranRobin Jun 23, 2023
7b93b24
server: fix some beginner mistakes
WangHaoranRobin Jun 23, 2023
7cd8fc2
Merge pull request #4 from WangHaoranRobin/robin_fork_master
WangHaoranRobin Jun 23, 2023
6c76c31
Merge branch 'ggerganov:master' into master
WangHaoranRobin Jun 23, 2023
02c96a4
server: remove trailling white space
WangHaoranRobin Jun 24, 2023
7f7046e
Merge pull request #5 from WangHaoranRobin/robin_fork_master
WangHaoranRobin Jun 24, 2023
23b516b
Merge branch 'ggerganov:master' into master
WangHaoranRobin Jun 24, 2023
af058cf
Merge branch 'ggerganov:master' into master
WangHaoranRobin Jun 25, 2023
e815b69
server: remove n_probs upper limit of 5
WangHaoranRobin Jun 25, 2023
bd6550b
Merge pull request #6 from WangHaoranRobin/robin_fork_master
WangHaoranRobin Jun 25, 2023
c9e6642
server: handle probs output when temp=0; handle final response probs …
WangHaoranRobin Jun 25, 2023
13f5d69
Merge branch 'master' into robin_fork_master
WangHaoranRobin Jun 25, 2023
77edee7
Merge pull request #7 from WangHaoranRobin/robin_fork_master
WangHaoranRobin Jun 25, 2023
b5c5c8e
Merge branch 'ggerganov:master' into master
WangHaoranRobin Jun 26, 2023
c7f7f13
Merge branch 'ggerganov:master' into master
WangHaoranRobin Jun 27, 2023
bc88fec
server: fix llama_sample_top_k order
WangHaoranRobin Jun 27, 2023
58828c2
Merge pull request #8 from WangHaoranRobin/robin_fork_master
WangHaoranRobin Jun 27, 2023
1d22550
Merge branch 'ggerganov:master' into master
WangHaoranRobin Jun 28, 2023
ad80773
Merge branch 'ggerganov:master' into master
WangHaoranRobin Jul 1, 2023
1a70a80
examples/common.h: put all bool variables in gpt_params together
WangHaoranRobin Jul 2, 2023
71f8296
examples/common.h: put all bool variables in gpt_params together
WangHaoranRobin Jul 2, 2023
cc3c86f
Merge pull request #9 from WangHaoranRobin/robin_fork_master
WangHaoranRobin Jul 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ struct gpt_params {
int32_t main_gpu = 0; // the GPU that is used for scratch and small tensors
float tensor_split[LLAMA_MAX_DEVICES] = {0}; // how split tensors should be distributed across GPUs
bool low_vram = 0; // if true, reduce VRAM usage at the cost of performance
ggerganov marked this conversation as resolved.
Show resolved Hide resolved
int32_t n_probs = 0; // if greater than 1, output the probabilities of top n_probs tokens. Max 5
int32_t n_probs = 0; // if greater than 0, output the probabilities of top n_probs tokens.
SlyEcho marked this conversation as resolved.
Show resolved Hide resolved

// sampling parameters
std::unordered_map<llama_token, float> logit_bias; // logit bias for specific tokens
Expand Down
1 change: 1 addition & 0 deletions examples/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,7 @@ struct llama_server_context {
result.tok = llama_sample_token(ctx, &candidates_p);
}
}
// Add maximum of 5 most probable tokens to the result
for (size_t i = 0; i < std::min(candidates_p.size, std::min((size_t) n_probs, size_t(5))); ++i) {
result.probs.push_back({candidates_p.data[i].id, candidates_p.data[i].p});
}
Expand Down