Closed
Description
commit 67be2ce
Windows10 , cpu only.
Server always returns 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6.
Llava-cli returns the proper prompt token count.
From llava-cli:
llama_print_timings: load time = 25007.57 ms
llama_print_timings: sample time = 68.54 ms / 256 runs ( 0.27 ms per token, 3734.94 tokens per second)
llama_print_timings: prompt eval time = 421164.62 ms / 2902 tokens ( 145.13 ms per token, 6.89 tokens per second)
llama_print_timings: eval time = 66393.95 ms / 257 runs ( 258.34 ms per token, 3.87 tokens per second)
llama_print_timings: total time = 511967.49 ms / 3159 tokens
From server through API:
{
......
"timings": {
"predicted_ms": 57040.203,
"predicted_n": 233,
"predicted_per_second": 4.084838197367565,
"predicted_per_token_ms": 244.8077381974249,
"prompt_ms": 429987.864,
"prompt_n": 1,
"prompt_per_second": 0.0023256470326799734,
"prompt_per_token_ms": 429987.864
},
"tokens_cached": 3129,
"tokens_evaluated": 1,
"tokens_predicted": 233,
"truncated": false
}
From server console:
encode_image_with_clip: 5 segments encoded in 22462.62 ms
encode_image_with_clip: image embedding created: 2880 tokens
encode_image_with_clip: image encoded in 22495.54 ms by CLIP ( 7.81 ms per image patch)
{"function":"print_timings","level":"INFO","line":260,"msg":"prompt eval time = 429987.86 ms / 1 tokens (429987.86 ms per token, 0.00 tokens per second)","n_prompt_tokens_processed":1,"n_tokens_second":0.0023256470326799734,"slot_id":0,"t_prompt_processing":429987.864,"t_token":429987.864,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":274,"msg":"generation eval time = 57040.20 ms / 233 runs ( 244.81 ms per token, 4.08 tokens per second)","n_decoded":233,"n_tokens_second":4.084838197367565,"slot_id":0,"t_token":244.8077381974249,"t_token_generation":57040.203,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":283,"msg":" total time = 487028.07 ms","slot_id":0,"t_prompt_processing":429987.864,"t_token_generation":57040.203,"t_total":487028.067,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"update_slots","level":"INFO","line":1626,"msg":"slot released","n_cache_tokens":234,"n_ctx":4096,"n_past":3129,"n_system_tokens":0,"slot_id":0,"task_id":0,"tid":"8368","timestamp":1709356420,"truncated":false}
{"function":"log_server_request","level":"INFO","line":2693,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"127.0.0.1","remote_port":55351,"status":200,"tid":"7172","timestamp":1709356420}