|
| 1 | +# Metrics Documentation |
| 2 | + |
| 3 | +GuideLLM provides a comprehensive set of metrics to evaluate and optimize the performance of large language model (LLM) deployments. These metrics are designed to help users understand the behavior of their models under various conditions, identify bottlenecks, and make informed decisions about scaling and resource allocation. Below, we outline the key metrics measured by GuideLLM, their definitions, use cases, and how they can be interpreted. |
| 4 | + |
| 5 | +## Request Status Metrics |
| 6 | + |
| 7 | +### Successful, Incomplete, and Error Requests |
| 8 | + |
| 9 | +- **Successful Requests**: The number of requests that were completed successfully without any errors. |
| 10 | +- **Incomplete Requests**: The number of requests that were started but not completed, often due to timeouts or interruptions. |
| 11 | +- **Error Requests**: The number of requests that failed due to errors, such as invalid inputs or server issues. |
| 12 | + |
| 13 | +These metrics provide a breakdown of the overall request statuses, helping users identify the reliability and stability of their LLM deployment. |
| 14 | + |
| 15 | +### Requests Made |
| 16 | + |
| 17 | +- **Definition**: The total number of requests made during a benchmark run, broken down by status (successful, incomplete, error). |
| 18 | +- **Use Case**: Helps gauge the workload handled by the system and identify the proportion of requests that were successful versus those that failed or were incomplete. |
| 19 | + |
| 20 | +## Token Metrics |
| 21 | + |
| 22 | +### Prompt Tokens and Counts |
| 23 | + |
| 24 | +- **Definition**: The number of tokens in the input prompts sent to the LLM. |
| 25 | +- **Use Case**: Useful for understanding the complexity of the input data and its impact on model performance. |
| 26 | + |
| 27 | +### Output Tokens and Counts |
| 28 | + |
| 29 | +- **Definition**: The number of tokens generated by the LLM in response to the input prompts. |
| 30 | +- **Use Case**: Helps evaluate the model's output length and its correlation with latency and resource usage. |
| 31 | + |
| 32 | +## Performance Metrics |
| 33 | + |
| 34 | +### Request Rate (Requests Per Second) |
| 35 | + |
| 36 | +- **Definition**: The number of requests processed per second. |
| 37 | +- **Use Case**: Indicates the throughput of the system and its ability to handle concurrent workloads. |
| 38 | + |
| 39 | +### Request Concurrency |
| 40 | + |
| 41 | +- **Definition**: The number of requests being processed simultaneously. |
| 42 | +- **Use Case**: Helps evaluate the system's capacity to handle parallel workloads. |
| 43 | + |
| 44 | +### Output Tokens Per Second |
| 45 | + |
| 46 | +- **Definition**: The average number of output tokens generated per second as a throughput metric across all requests. |
| 47 | +- **Use Case**: Provides insights into the server's performance and efficiency in generating output tokens. |
| 48 | + |
| 49 | +### Total Tokens Per Second |
| 50 | + |
| 51 | +- **Definition**: The combined rate of prompt and output tokens processed per second as a throughput metric across all requests. |
| 52 | +- **Use Case**: Provides insights into the server's overall performance and efficiency in processing both prompt and output tokens. |
| 53 | + |
| 54 | +### Request Latency |
| 55 | + |
| 56 | +- **Definition**: The time taken to process a single request, from start to finish. |
| 57 | +- **Use Case**: A critical metric for evaluating the responsiveness of the system. |
| 58 | + |
| 59 | +### Time to First Token (TTFT) |
| 60 | + |
| 61 | +- **Definition**: The time taken to generate the first token of the output. |
| 62 | +- **Use Case**: Indicates the initial response time of the model, which is crucial for user-facing applications. |
| 63 | + |
| 64 | +### Inter-Token Latency (ITL) |
| 65 | + |
| 66 | +- **Definition**: The average time between generating consecutive tokens in the output, excluding the first token. |
| 67 | +- **Use Case**: Helps assess the smoothness and speed of token generation. |
| 68 | + |
| 69 | +### Time Per Output Token |
| 70 | + |
| 71 | +- **Definition**: The average time taken to generate each output token, including the first token. |
| 72 | +- **Use Case**: Provides a detailed view of the model's token generation efficiency. |
| 73 | + |
| 74 | +## Statistical Summaries |
| 75 | + |
| 76 | +GuideLLM provides detailed statistical summaries for each of the above metrics using the `StatusDistributionSummary` and `DistributionSummary` models. These summaries include the following statistics: |
| 77 | + |
| 78 | +### Summary Statistics |
| 79 | + |
| 80 | +- **Mean**: The average value of the metric. |
| 81 | +- **Median**: The middle value of the metric when sorted. |
| 82 | +- **Mode**: The most frequently occurring value of the metric. |
| 83 | +- **Variance**: The measure of how much the values of the metric vary. |
| 84 | +- **Standard Deviation (Std Dev)**: The square root of the variance, indicating the spread of the values. |
| 85 | +- **Min**: The minimum value of the metric. |
| 86 | +- **Max**: The maximum value of the metric. |
| 87 | +- **Count**: The total number of data points for the metric. |
| 88 | +- **Total Sum**: The sum of all values for the metric. |
| 89 | + |
| 90 | +### Percentiles |
| 91 | + |
| 92 | +GuideLLM calculates a comprehensive set of percentiles for each metric, including: |
| 93 | + |
| 94 | +- **0.1th Percentile (p001)**: The value below which 0.1% of the data falls. |
| 95 | +- **1st Percentile (p01)**: The value below which 1% of the data falls. |
| 96 | +- **5th Percentile (p05)**: The value below which 5% of the data falls. |
| 97 | +- **10th Percentile (p10)**: The value below which 10% of the data falls. |
| 98 | +- **25th Percentile (p25)**: The value below which 25% of the data falls. |
| 99 | +- **75th Percentile (p75)**: The value below which 75% of the data falls. |
| 100 | +- **90th Percentile (p90)**: The value below which 90% of the data falls. |
| 101 | +- **95th Percentile (p95)**: The value below which 95% of the data falls. |
| 102 | +- **99th Percentile (p99)**: The value below which 99% of the data falls. |
| 103 | +- **99.9th Percentile (p999)**: The value below which 99.9% of the data falls. |
| 104 | + |
| 105 | +### Use Cases for Statistical Summaries |
| 106 | + |
| 107 | +- **Mean and Median**: Provide a central tendency of the metric values. |
| 108 | +- **Variance and Std Dev**: Indicate the variability and consistency of the metric. |
| 109 | +- **Min and Max**: Highlight the range of the metric values. |
| 110 | +- **Percentiles**: Offer a detailed view of the distribution, helping identify outliers and performance at different levels of service. |
| 111 | + |
| 112 | +By combining these metrics and statistical summaries, GuideLLM enables users to gain a deep understanding of their LLM deployments, optimize performance, and ensure scalability and cost-effectiveness. |
0 commit comments