Closed
Description
Hi, I try to get the token probabilities with latest code from main branch, compiled with cmake under linux, during compilation had some warnings (not imporant), but after run server binary and infer request, got empty complation_probabilites field.
Request:
curl --request POST \
--url http://192.168.41.197:8081/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Some stories about railway station","n_predict": 256, "n_probs" : 3}'
Response:
{"completion_probabilities":[],"content":" Rail statins ........ some content here","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/home/max/DISK2/llama13b_200K/ggml-model-f16.gguf","n_ctx":512,"n_keep":0,"n_predict":256,"n_probs":3,"penalize_nl":true,"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"seed":4294967295,"stop":[],"stream":false,"temp":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0},"model":"/home/max/DISK2/llama13b_200K/ggml-model-f16.gguf","prompt":"Some stories about railway station","slot_id":0,"stop":true,"stopped_eos":true,"stopped_limit":false,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":3258.927,"predicted_n":55,"predicted_per_second":16.87672046658302,"predicted_per_token_ms":59.253218181818184,"prompt_ms":429.54,"prompt_n":20,"prompt_per_second":46.561437817199796,"prompt_per_token_ms":21.477},"tokens_cached":75,"tokens_evaluated":20,"tokens_predicted":55,"truncated":false}
Where I have an mistake? or it is a bug?
Metadata
Metadata
Assignees
Labels
No labels