Skip to content

Server always incorrectly reports 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6. #5863

Closed
@chigkim

Description

@chigkim

commit 67be2ce
Windows10 , cpu only.

Server always returns 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6.
Llava-cli returns the proper prompt token count.

From llava-cli:

llama_print_timings:    load time =   25007.57 ms
llama_print_timings:    sample time =    68.54 ms /   256 runs   (  0.27 ms per token,  3734.94 tokens per second)
llama_print_timings: prompt eval time =  421164.62 ms /  2902 tokens (  145.13 ms per token,   6.89 tokens per second)
llama_print_timings:    eval time =   66393.95 ms /   257 runs   (  258.34 ms per token,   3.87 tokens per second)
llama_print_timings:     total time =  511967.49 ms /  3159 tokens

From server through API:

{
	......
	"timings": {
		"predicted_ms": 57040.203,
		"predicted_n": 233,
		"predicted_per_second": 4.084838197367565,
		"predicted_per_token_ms": 244.8077381974249,
		"prompt_ms": 429987.864,
		"prompt_n": 1,
		"prompt_per_second": 0.0023256470326799734,
		"prompt_per_token_ms": 429987.864
	},
	"tokens_cached": 3129,
	"tokens_evaluated": 1,
	"tokens_predicted": 233,
	"truncated": false
}

From server console:

encode_image_with_clip: 5 segments encoded in 22462.62 ms
encode_image_with_clip: image embedding created: 2880 tokens

encode_image_with_clip: image encoded in 22495.54 ms by CLIP (    7.81 ms per image patch)
{"function":"print_timings","level":"INFO","line":260,"msg":"prompt eval time     =  429987.86 ms /     1 tokens (429987.86 ms per token,     0.00 tokens per second)","n_prompt_tokens_processed":1,"n_tokens_second":0.0023256470326799734,"slot_id":0,"t_prompt_processing":429987.864,"t_token":429987.864,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":274,"msg":"generation eval time =   57040.20 ms /   233 runs   (  244.81 ms per token,     4.08 tokens per second)","n_decoded":233,"n_tokens_second":4.084838197367565,"slot_id":0,"t_token":244.8077381974249,"t_token_generation":57040.203,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":283,"msg":"          total time =  487028.07 ms","slot_id":0,"t_prompt_processing":429987.864,"t_token_generation":57040.203,"t_total":487028.067,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"update_slots","level":"INFO","line":1626,"msg":"slot released","n_cache_tokens":234,"n_ctx":4096,"n_past":3129,"n_system_tokens":0,"slot_id":0,"task_id":0,"tid":"8368","timestamp":1709356420,"truncated":false}
{"function":"log_server_request","level":"INFO","line":2693,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"127.0.0.1","remote_port":55351,"status":200,"tid":"7172","timestamp":1709356420}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions