Closed
Description
Your current environment
The bug is not related to the envirement
Model Input Dumps
The bug does not related to the model
🐛 Describe the bug
QUESTION 1:
How do you calculate the RequestMetrics
in RequestOutput
please look at screen-shot below (in YELLOW):
I have found here in L. 696 that last_token_time
is equal to arrival_time
!!! IS IT A BUG?
Could you please tell me what unit is the time is it second? nanosecond? I believe it is something like this example below (correct me if I am wrong):
import time
arrival_time = time.perf_counter()
QUESTION 2:
How can I calculate the tokens/second (for output), TTFT, TBT, throughput and total time
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.