Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard #5571

Merged
merged 16 commits into from
Jun 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 13 additions & 8 deletions .buildkite/nightly-benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,24 @@ This benchmark will be *triggered* upon:

**Benchmarking Duration**: about 1hr.

## Configuring the workload for the quick benchmark
**For benchmarking developers**: please try your best to constraint the duration of benchmarking to less than 1.5 hr so that it won't take forever to run.

The workload of the quick benchmark contains two parts: latency tests in `latency-tests.json`, throughput tests in `throughput-tests.json` and serving tests in `serving-tests.json`.

## Configuring the workload

The benchmarking workload contains three parts:
- Latency tests in `latency-tests.json`.
- Throughput tests in `throughput-tests.json`.
- Serving tests in `serving-tests.json`.

See [descriptions.md](tests/descriptions.md) for detailed descriptions.

### Latency test

Here is an example of one test inside `latency-tests.json`:

```json
[
...
{
"test_name": "latency_llama8B_tp1",
"parameters": {
Expand All @@ -34,7 +41,6 @@ Here is an example of one test inside `latency-tests.json`:
"num_iters": 15
}
},
...
]
```

Expand All @@ -57,7 +63,6 @@ We test the throughput by using `benchmark_serving.py` with request rate = inf t

```
[
...
{
"test_name": "serving_llama8B_tp1_sharegpt",
"qps_list": [1, 4, 16, "inf"],
Expand All @@ -77,7 +82,6 @@ We test the throughput by using `benchmark_serving.py` with request rate = inf t
"num_prompts": 200
}
},
...
]
```

Expand All @@ -92,7 +96,8 @@ The number of this test is less stable compared to the delay and latency benchma
WARNING: The benchmarking script will save json results by itself, so please do not configure `--save-results` or other results-saving-related parameters in `serving-tests.json`.

## Visualizing the results
The `convert-results-json-to-markdown.py` helps you put the benchmarking results inside a markdown table.
The `convert-results-json-to-markdown.py` helps you put the benchmarking results inside a markdown table, by formatting [descriptions.md](tests/descriptions.md) with real benchmarking results.
You can find the result presented as a table inside the `buildkite/performance-benchmark` job page.
If you do not see the table, please wait till the benchmark finish running.
The JSON file is also attached within each buildkite job for further analysis.
The json version of the table (together with the json version of the benchmark) will be also attached to the markdown file.
The raw benchmarking results (in the format of json files) are in the `Artifacts` tab of the benchmarking.
6 changes: 3 additions & 3 deletions .buildkite/nightly-benchmarks/run-benchmarks-suite.sh
Original file line number Diff line number Diff line change
Expand Up @@ -343,9 +343,9 @@ main() {
QUICK_BENCHMARK_ROOT=../.buildkite/nightly-benchmarks/

# benchmarking
run_serving_tests $QUICK_BENCHMARK_ROOT/serving-tests.json
run_latency_tests $QUICK_BENCHMARK_ROOT/latency-tests.json
run_throughput_tests $QUICK_BENCHMARK_ROOT/throughput-tests.json
run_serving_tests $QUICK_BENCHMARK_ROOT/tests/serving-tests.json
run_latency_tests $QUICK_BENCHMARK_ROOT/tests/latency-tests.json
run_throughput_tests $QUICK_BENCHMARK_ROOT/tests/throughput-tests.json


# postprocess benchmarking results
Expand Down
260 changes: 145 additions & 115 deletions .buildkite/nightly-benchmarks/scripts/convert-results-json-to-markdown.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import json
import os
from pathlib import Path

import pandas as pd
Expand All @@ -11,145 +12,174 @@
latency_column_mapping = {
"test_name": "Test name",
"gpu_type": "GPU",
"avg_latency": "Average latency (s)",
"P10": "P10 (s)",
"P25": "P25 (s)",
"P50": "P50 (s)",
"P75": "P75 (s)",
"P90": "P90 (s)",
"avg_latency": "Mean latency (ms)",
# "P10": "P10 (s)",
# "P25": "P25 (s)",
"P50": "Median",
# "P75": "P75 (s)",
# "P90": "P90 (s)",
"P99": "P99",
}

# thoughput tests and the keys that will be printed into markdown
throughput_results = []
throughput_results_column_mapping = {
"test_name": "Test name",
"gpu_type": "GPU",
"num_requests": "# of req.",
"total_num_tokens": "Total # of tokens",
"elapsed_time": "Elapsed time (s)",
# "num_requests": "# of req.",
# "total_num_tokens": "Total # of tokens",
# "elapsed_time": "Elapsed time (s)",
"requests_per_second": "Tput (req/s)",
"tokens_per_second": "Tput (tok/s)",
# "tokens_per_second": "Tput (tok/s)",
}

# serving results and the keys that will be printed into markdown
serving_results = []
serving_column_mapping = {
"test_name": "Test name",
"gpu_type": "GPU",
"completed": "# of req.",
# "completed": "# of req.",
"request_throughput": "Tput (req/s)",
"input_throughput": "Input Tput (tok/s)",
"output_throughput": "Output Tput (tok/s)",
# "input_throughput": "Input Tput (tok/s)",
# "output_throughput": "Output Tput (tok/s)",
"mean_ttft_ms": "Mean TTFT (ms)",
# do not say TTFT again to avoid the table getting too wide
"median_ttft_ms": "Median",
"p99_ttft_ms": "P99",
"mean_tpot_ms": "Mean TPOT (ms)",
"median_tpot_ms": "Median",
"p99_tpot_ms": "P99",
# "mean_tpot_ms": "Mean TPOT (ms)",
# "median_tpot_ms": "Median",
# "p99_tpot_ms": "P99",
"mean_itl_ms": "Mean ITL (ms)",
"median_itl_ms": "Median",
"p99_itl_ms": "P99",
}

for test_file in results_folder.glob("*.json"):

with open(test_file, "r") as f:
raw_result = json.loads(f.read())

if "serving" in str(test_file):
# this result is generated via `benchmark_serving.py`

# attach the benchmarking command to raw_result
with open(test_file.with_suffix(".commands"), "r") as f:
command = json.loads(f.read())
raw_result.update(command)

# update the test name of this result
raw_result.update({"test_name": test_file.stem})

# add the result to raw_result
serving_results.append(raw_result)
continue

elif "latency" in f.name:
# this result is generated via `benchmark_latency.py`

# attach the benchmarking command to raw_result
with open(test_file.with_suffix(".commands"), "r") as f:
command = json.loads(f.read())
raw_result.update(command)

# update the test name of this result
raw_result.update({"test_name": test_file.stem})

# get different percentiles
for perc in [10, 25, 50, 75, 90]:
raw_result.update(
{f"P{perc}": raw_result["percentiles"][str(perc)]})

# add the result to raw_result
latency_results.append(raw_result)
continue

elif "throughput" in f.name:
# this result is generated via `benchmark_throughput.py`

# attach the benchmarking command to raw_result
with open(test_file.with_suffix(".commands"), "r") as f:
command = json.loads(f.read())
raw_result.update(command)

# update the test name of this result
raw_result.update({"test_name": test_file.stem})

# add the result to raw_result
throughput_results.append(raw_result)
continue

print(f"Skipping {test_file}")

latency_results = pd.DataFrame.from_dict(latency_results)
serving_results = pd.DataFrame.from_dict(serving_results)
throughput_results = pd.DataFrame.from_dict(throughput_results)

# remapping the key, for visualization purpose
if not latency_results.empty:
latency_results = latency_results[list(
latency_column_mapping.keys())].rename(columns=latency_column_mapping)
if not serving_results.empty:
serving_results = serving_results[list(
serving_column_mapping.keys())].rename(columns=serving_column_mapping)
if not throughput_results.empty:
throughput_results = throughput_results[list(
throughput_results_column_mapping.keys())].rename(
columns=throughput_results_column_mapping)

# get markdown tables
latency_md_table = tabulate(latency_results,
headers='keys',
tablefmt='pipe',
showindex=False)
serving_md_table = tabulate(serving_results,
headers='keys',
tablefmt='pipe',
showindex=False)
throughput_md_table = tabulate(throughput_results,
headers='keys',
tablefmt='pipe',
showindex=False)

# document the result
with open(results_folder / "benchmark_results.md", "w") as f:

def read_markdown(file):
if os.path.exists(file):
with open(file, "r") as f:
return f.read() + "\n"
else:
return f"{file} not found.\n"


def results_to_json(latency, throughput, serving):
return json.dumps({
'latency': latency.to_dict(),
'throughput': throughput.to_dict(),
'serving': serving.to_dict()
})


if __name__ == "__main__":

# collect results
for test_file in results_folder.glob("*.json"):

with open(test_file, "r") as f:
raw_result = json.loads(f.read())

if "serving" in str(test_file):
# this result is generated via `benchmark_serving.py`

# attach the benchmarking command to raw_result
with open(test_file.with_suffix(".commands"), "r") as f:
command = json.loads(f.read())
raw_result.update(command)

# update the test name of this result
raw_result.update({"test_name": test_file.stem})

# add the result to raw_result
serving_results.append(raw_result)
continue

elif "latency" in f.name:
# this result is generated via `benchmark_latency.py`

# attach the benchmarking command to raw_result
with open(test_file.with_suffix(".commands"), "r") as f:
command = json.loads(f.read())
raw_result.update(command)

# update the test name of this result
raw_result.update({"test_name": test_file.stem})

# get different percentiles
for perc in [10, 25, 50, 75, 90, 99]:
# Multiply 1000 to convert the time unit from s to ms
raw_result.update(
{f"P{perc}": 1000 * raw_result["percentiles"][str(perc)]})
raw_result["avg_latency"] = raw_result["avg_latency"] * 1000

# add the result to raw_result
latency_results.append(raw_result)
continue

elif "throughput" in f.name:
# this result is generated via `benchmark_throughput.py`

# attach the benchmarking command to raw_result
with open(test_file.with_suffix(".commands"), "r") as f:
command = json.loads(f.read())
raw_result.update(command)

# update the test name of this result
raw_result.update({"test_name": test_file.stem})

# add the result to raw_result
throughput_results.append(raw_result)
continue

print(f"Skipping {test_file}")

latency_results = pd.DataFrame.from_dict(latency_results)
serving_results = pd.DataFrame.from_dict(serving_results)
throughput_results = pd.DataFrame.from_dict(throughput_results)

raw_results_json = results_to_json(latency_results, throughput_results,
serving_results)

# remapping the key, for visualization purpose
if not latency_results.empty:
f.write("## Latency tests\n")
f.write(latency_md_table)
f.write("\n")
if not throughput_results.empty:
f.write("## Throughput tests\n")
f.write(throughput_md_table)
f.write("\n")
latency_results = latency_results[list(
latency_column_mapping.keys())].rename(
columns=latency_column_mapping)
if not serving_results.empty:
f.write("## Serving tests\n")
f.write(serving_md_table)
f.write("\n")
serving_results = serving_results[list(
serving_column_mapping.keys())].rename(
columns=serving_column_mapping)
if not throughput_results.empty:
throughput_results = throughput_results[list(
throughput_results_column_mapping.keys())].rename(
columns=throughput_results_column_mapping)

processed_results_json = results_to_json(latency_results,
throughput_results,
serving_results)

# get markdown tables
latency_md_table = tabulate(latency_results,
headers='keys',
tablefmt='pipe',
showindex=False)
serving_md_table = tabulate(serving_results,
headers='keys',
tablefmt='pipe',
showindex=False)
throughput_md_table = tabulate(throughput_results,
headers='keys',
tablefmt='pipe',
showindex=False)

# document the result
with open(results_folder / "benchmark_results.md", "w") as f:

results = read_markdown(
"../.buildkite/nightly-benchmarks/tests/descriptions.md")
results = results.format(
latency_tests_markdown_table=latency_md_table,
throughput_tests_markdown_table=throughput_md_table,
serving_tests_markdown_table=serving_md_table,
benchmarking_results_in_json_string=processed_results_json)
f.write(results)
Loading
Loading