Skip to content
This repository was archived by the owner on Aug 7, 2025. It is now read-only.

Commit 25f3700

Browse files
namannandanNaman Nandan
andauthored
BERT nightly benchmark on Inferentia2 (#2283)
* Inf2 nightly benchmark * fix linter spellcheck error --------- Co-authored-by: Naman Nandan <namannan@amazon.com>
1 parent f01868f commit 25f3700

File tree

7 files changed

+140
-4
lines changed

7 files changed

+140
-4
lines changed

.github/workflows/benchmark_nightly.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
strategy:
1111
fail-fast: false
1212
matrix:
13-
hardware: [cpu, gpu, inf1]
13+
hardware: [cpu, gpu, inf1, inf2]
1414
runs-on:
1515
- self-hosted
1616
- ${{ matrix.hardware }}
@@ -52,6 +52,11 @@ jobs:
5252
env:
5353
NEURON_RT_NUM_CORES: 4
5454
run: python benchmarks/auto_benchmark.py --input benchmarks/benchmark_config_neuron.yaml --skip false
55+
- name: Benchmark inf2 nightly
56+
if: ${{ matrix.hardware == 'inf2' }}
57+
env:
58+
NEURON_RT_NUM_CORES: 1
59+
run: python benchmarks/auto_benchmark.py --input benchmarks/benchmark_config_neuronx.yaml --skip false
5560
- name: Save benchmark artifacts
5661
uses: actions/upload-artifact@v2
5762
with:

benchmarks/auto_benchmark.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ def load_config(self):
9797

9898
self.bm_config["model_config_path"] = (
9999
"{}/{}".format(MODEL_JSON_CONFIG_PATH, self.bm_config["hardware"])
100-
if self.bm_config["hardware"] in ["cpu", "gpu", "neuron"]
100+
if self.bm_config["hardware"] in ["cpu", "gpu", "neuron", "neuronx"]
101101
else "{}/cpu".format(MODEL_JSON_CONFIG_PATH)
102102
)
103103

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Torchserve version is to be installed. It can be one of the options
2+
# - branch : "master"
3+
# - nightly: "2022.3.16"
4+
# - release: "0.5.3"
5+
# Nightly build will be installed if "ts_version" is not specifiged
6+
#ts_version:
7+
# branch: &ts_version "master"
8+
9+
# a list of model configure yaml files defined in benchmarks/models_config
10+
# or a list of model configure yaml files with full path
11+
models:
12+
- "bert_neuronx.yaml"
13+
14+
# benchmark on "cpu", "gpu", "neuron" or "neuronx".
15+
# "cpu" is set if "hardware" is not specified
16+
hardware: &hardware "neuronx"
17+
18+
# load prometheus metrics report to remote storage or local different path if "metrics_cmd" is set.
19+
# the command line to load prometheus metrics report to remote system.
20+
# Here is an example of AWS cloudwatch command:
21+
# Note:
22+
# - keep the values order as the same as the command definition.
23+
# - set up the command before enabling `metrics_cmd`.
24+
# For example, aws client and AWS credentials need to be setup before trying this example.
25+
metrics_cmd:
26+
- "cmd": "aws cloudwatch put-metric-data"
27+
- "--namespace": ["torchserve_benchmark_nightly_", *hardware]
28+
- "--region": "us-east-2"
29+
- "--metric-data": 'file:///tmp/benchmark/logs/stats_metrics.json'
30+
31+
# load report to remote storage or local different path if "report_cmd" is set.
32+
# the command line to load report to remote storage.
33+
# Here is an example of AWS cloudwatch command:
34+
# Note:
35+
# - keep the values order as the same as the command.
36+
# - set up the command before enabling `report_cmd`.
37+
# For example, aws client, AWS credentials and S3 bucket
38+
# need to be setup before trying this example.
39+
# - "today()" is a keyword to apply current date in the path
40+
# For example, the dest path in the following example is
41+
# s3://torchserve-model-serving/benchmark/2022-03-18/gpu
42+
report_cmd:
43+
- "cmd": "aws s3 cp --recursive"
44+
- "source": '/tmp/ts_benchmark/'
45+
- "dest": ['s3://torchserve-benchmark/nightly', "today()", *hardware]
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
bert_neuronx_batch_1:
3+
scripted_mode:
4+
benchmark_engine: "ab"
5+
url: https://torchserve.pytorch.org/mar_files/BERTSeqClassification_torchscript_neuronx_batch_1.mar
6+
workers:
7+
- 2
8+
batch_delay: 100
9+
batch_size:
10+
- 1
11+
input: "./examples/Huggingface_Transformers/Seq_classification_artifacts/sample_text.txt"
12+
requests: 10000
13+
concurrency: 100
14+
backend_profiling: False
15+
exec_env: "local"
16+
processors:
17+
- "neuronx"
18+
19+
bert_neuronx_batch_2:
20+
scripted_mode:
21+
benchmark_engine: "ab"
22+
url: https://torchserve.pytorch.org/mar_files/BERTSeqClassification_torchscript_neuronx_batch_2.mar
23+
workers:
24+
- 2
25+
batch_delay: 100
26+
batch_size:
27+
- 2
28+
input: "./examples/Huggingface_Transformers/Seq_classification_artifacts/sample_text.txt"
29+
requests: 10000
30+
concurrency: 100
31+
backend_profiling: False
32+
exec_env: "local"
33+
processors:
34+
- "neuronx"
35+
36+
bert_neuronx_batch_4:
37+
scripted_mode:
38+
benchmark_engine: "ab"
39+
url: https://torchserve.pytorch.org/mar_files/BERTSeqClassification_torchscript_neuronx_batch_4.mar
40+
workers:
41+
- 2
42+
batch_delay: 100
43+
batch_size:
44+
- 4
45+
input: "./examples/Huggingface_Transformers/Seq_classification_artifacts/sample_text.txt"
46+
requests: 10000
47+
concurrency: 100
48+
backend_profiling: False
49+
exec_env: "local"
50+
processors:
51+
- "neuronx"
52+
53+
bert_neuronx_batch_8:
54+
scripted_mode:
55+
benchmark_engine: "ab"
56+
url: https://torchserve.pytorch.org/mar_files/BERTSeqClassification_torchscript_neuronx_batch_8.mar
57+
workers:
58+
- 2
59+
batch_delay: 100
60+
batch_size:
61+
- 8
62+
input: "./examples/Huggingface_Transformers/Seq_classification_artifacts/sample_text.txt"
63+
requests: 10000
64+
concurrency: 100
65+
backend_profiling: False
66+
exec_env: "local"
67+
processors:
68+
- "neuronx"

examples/Huggingface_Transformers/Download_Transformer_models.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,23 @@ def transformers_model_dowloader(
121121
"traced_{}_model_neuron_batch_{}.pt".format(model_name, batch_size),
122122
),
123123
)
124+
elif hardware == "neuronx":
125+
import torch_neuronx
126+
127+
input_ids = torch.cat([inputs["input_ids"]] * batch_size, 0).to(device)
128+
attention_mask = torch.cat([inputs["attention_mask"]] * batch_size, 0).to(
129+
device
130+
)
131+
traced_model = torch_neuronx.trace(model, (input_ids, attention_mask))
132+
torch.jit.save(
133+
traced_model,
134+
os.path.join(
135+
NEW_DIR,
136+
"traced_{}_model_neuronx_batch_{}.pt".format(
137+
model_name, batch_size
138+
),
139+
),
140+
)
124141
else:
125142
input_ids = inputs["input_ids"].to(device)
126143
attention_mask = inputs["attention_mask"].to(device)

examples/Huggingface_Transformers/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,9 @@ In the setup_config.json :
5151

5252
*embedding_name* : The name of embedding layer in the chosen model, this could be `bert` for `bert-base-uncased`, `roberta` for `roberta-base` or `roberta` for `xlm-roberta-large`, or `gpt2` for `gpt2` model
5353

54-
*hardware* : The target platform to trace the model for. Specify as `neuron` for [Inferentia1](https://aws.amazon.com/ec2/instance-types/inf1/).
54+
*hardware* : The target platform to trace the model for. Specify as `neuron` for [Inferentia1](https://aws.amazon.com/ec2/instance-types/inf1/) and `neuronx` for [Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/).
5555

56-
*batch_size* : Input batch size when tracing the model for `neuron` as target hardware.
56+
*batch_size* : Input batch size when tracing the model for `neuron` or `neuronx` as target hardware.
5757

5858
Once, `setup_config.json` has been set properly, the next step is to run
5959

ts_scripts/spellcheck_conf/wordlist.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1051,3 +1051,4 @@ largemodels
10511051
torchpippy
10521052
InferenceSession
10531053
maxRetryTimeoutInSec
1054+
neuronx

0 commit comments

Comments
 (0)