Skip to content

Commit e0feefd

Browse files
authored
Merge branch 'master' into xiyon/add-cb-for-targetqps
2 parents 8611853 + aaa8f88 commit e0feefd

File tree

11 files changed

+1301
-966
lines changed

11 files changed

+1301
-966
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ See the individual Readme files in the reference app for details.
2626
| ssd-resnet34 1200x1200 | [vision/classification_and_detection](https://github.com/mlcommons/inference/tree/master/vision/classification_and_detection) | tensorflow, pytorch, onnx | coco resized to 1200x1200|
2727
| bert | [language/bert](https://github.com/mlcommons/inference/tree/master/language/bert) | tensorflow, pytorch, onnx | squad-1.1 |
2828
| dlrm | [recommendation/dlrm](https://github.com/mlcommons/inference/tree/master/recommendation/dlrm/pytorch) | pytorch, tensorflow(?), onnx(?) | Criteo Terabyte |
29-
| 3d-unet | [vision/medical_imageing/3d-unet-kits19](https://github.com/mlcommons/inference/tree/master/vision/medical_imaging/3d-unet-kits19) | pytorch, tensorflow, onnx | KiTS19 |
29+
| 3d-unet | [vision/medical_imaging/3d-unet-kits19](https://github.com/mlcommons/inference/tree/master/vision/medical_imaging/3d-unet-kits19) | pytorch, tensorflow, onnx | KiTS19 |
3030
| rnnt | [speech_recognition/rnnt](https://github.com/mlcommons/inference/tree/master/speech_recognition/rnnt) | pytorch | OpenSLR LibriSpeech Corpus |
3131

3232

@@ -42,7 +42,7 @@ See the individual Readme files in the reference app for details.
4242
| ssd-resnet34 1200x1200 | [vision/classification_and_detection](https://github.com/mlcommons/inference/tree/r1.1/vision/classification_and_detection) | tensorflow, pytorch, onnx | coco resized to 1200x1200|
4343
| bert | [language/bert](https://github.com/mlcommons/inference/tree/r1.1/language/bert) | tensorflow, pytorch, onnx | squad-1.1 |
4444
| dlrm | [recommendation/dlrm](https://github.com/mlcommons/inference/tree/r1.1/recommendation/dlrm/pytorch) | pytorch, tensorflow(?), onnx(?) | Criteo Terabyte |
45-
| 3d-unet | [vision/medical_imageing/3d-unet](https://github.com/mlcommons/inference/tree/r1.1/vision/medical_imaging/3d-unet) | pytorch, tensorflow(?), onnx(?) | BraTS 2019 |
45+
| 3d-unet | [vision/medical_imaging/3d-unet](https://github.com/mlcommons/inference/tree/r1.1/vision/medical_imaging/3d-unet) | pytorch, tensorflow(?), onnx(?) | BraTS 2019 |
4646
| rnnt | [speech_recognition/rnnt](https://github.com/mlcommons/inference/tree/r1.1/speech_recognition/rnnt) | pytorch | OpenSLR LibriSpeech Corpus |
4747

4848
## MLPerf Inference v1.0 (submission 03/19/2021)
@@ -57,7 +57,7 @@ See the individual Readme files in the reference app for details.
5757
| ssd-resnet34 1200x1200 | [vision/classification_and_detection](https://github.com/mlcommons/inference/tree/r1.0/vision/classification_and_detection) | tensorflow, pytorch, onnx | coco resized to 1200x1200|
5858
| bert | [language/bert](https://github.com/mlcommons/inference/tree/r1.0/language/bert) | tensorflow, pytorch, onnx | squad-1.1 |
5959
| dlrm | [recommendation/dlrm](https://github.com/mlcommons/inference/tree/r1.0/recommendation/dlrm/pytorch) | pytorch, tensorflow(?), onnx(?) | Criteo Terabyte |
60-
| 3d-unet | [vision/medical_imageing/3d-unet](https://github.com/mlcommons/inference/tree/r1.0/vision/medical_imaging/3d-unet) | pytorch, tensorflow(?), onnx(?) | BraTS 2019 |
60+
| 3d-unet | [vision/medical_imaging/3d-unet](https://github.com/mlcommons/inference/tree/r1.0/vision/medical_imaging/3d-unet) | pytorch, tensorflow(?), onnx(?) | BraTS 2019 |
6161
| rnnt | [speech_recognition/rnnt](https://github.com/mlcommons/inference/tree/r1.0/speech_recognition/rnnt) | pytorch | OpenSLR LibriSpeech Corpus |
6262

6363

@@ -73,7 +73,7 @@ See the individual Readme files in the reference app for details.
7373
| ssd-resnet34 1200x1200 | [vision/classification_and_detection](https://github.com/mlcommons/inference/tree/r0.7/vision/classification_and_detection) | tensorflow, pytorch, onnx | coco resized to 1200x1200|
7474
| bert | [language/bert](https://github.com/mlcommons/inference/tree/r0.7/language/bert) | tensorflow, pytorch, onnx | squad-1.1 |
7575
| dlrm | [recommendation/dlrm](https://github.com/mlcommons/inference/tree/r0.7/recommendation/dlrm/pytorch) | pytorch, tensorflow(?), onnx(?) | Criteo Terabyte |
76-
| 3d-unet | [vision/medical_imageing/3d-unet](https://github.com/mlcommons/inference/tree/r0.7/vision/medical_imaging/3d-unet) | pytorch, tensorflow(?), onnx(?) | BraTS 2019 |
76+
| 3d-unet | [vision/medical_imaging/3d-unet](https://github.com/mlcommons/inference/tree/r0.7/vision/medical_imaging/3d-unet) | pytorch, tensorflow(?), onnx(?) | BraTS 2019 |
7777
| rnnt | [speech_recognition/rnnt](https://github.com/mlcommons/inference/tree/r0.7/speech_recognition/rnnt) | pytorch | OpenSLR LibriSpeech Corpus |
7878

7979
## MLPerf Inference v0.5

compliance/nvidia/TEST04-A/README.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,18 +24,20 @@ This test requires measuring & comparing performance of SUT (PerformanceOnly, mo
2424
Test script works best with Python 3.3 or later.
2525

2626
## Exempt Benchmarks
27-
This test is not applicable for the following benchmarks whose performance is dependent on variably sized input samples:
28-
1. RNNT
29-
2. BERT
30-
3. DLRM
27+
This test is not applicable for the following benchmarks whose performance is dependent on variably sized input samples
28+
1. rnnt
29+
2. bert
30+
3. dlrm
31+
4. 3d-unet
3132

3233
## Scenarios
3334

34-
- This test is applicable for scenarios Offline, Server and SingleStream always.
35-
- This test is not applicable for Multi-Stream scenario if samples_per_query >= Performance Sample Count
35+
- As of v2.0, this test is applicable for all valid scenarios of non-exempt benchmarks.
3636

3737
## Pass Criteria
3838
Performance of TEST04-B should be slower than performance of TEST04-A. To account for noise, TEST04-A can be upto 20% slower than TEST04-B for SingleStream scenario with very short latencies (<200us) & upto 10% slower otherwise.
39+
Significant run-to-run variation can result due to the small number of samples in this test.
40+
To compensate, the performance sample count may be increased to increase the number of samples in the test, up to the size of the dataset or the size that still fits in the SUT's memory, whichever is reached first.
3941

4042
## Instructions
4143

language/bert/pytorch_SUT.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import mlperf_loadgen as lg
2626
import numpy as np
2727
import torch
28+
import transformers
2829
from transformers import BertConfig, BertForQuestionAnswering
2930
from squad_QSL import get_squad_QSL
3031

@@ -48,11 +49,13 @@ def __init__(self, args):
4849
vocab_size=config_json["vocab_size"])
4950

5051
self.dev = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
52+
self.version = transformers.__version__
5153

5254
print("Loading PyTorch model...")
5355
self.model = BertForQuestionAnswering(config)
5456
self.model.to(self.dev)
55-
self.model.load_state_dict(torch.load("build/data/bert_tf_v1_1_large_fp32_384_v2/model.pytorch"), strict=False)
57+
self.model.eval()
58+
self.model.load_state_dict(torch.load("build/data/bert_tf_v1_1_large_fp32_384_v2/model.pytorch"), strict=True)
5659

5760
print("Constructing SUT...")
5861
self.sut = lg.ConstructSUT(self.issue_queries, self.flush_queries, self.process_latencies)
@@ -67,8 +70,11 @@ def issue_queries(self, query_samples):
6770
model_output = self.model.forward(input_ids=torch.LongTensor(eval_features.input_ids).unsqueeze(0).to(self.dev),
6871
attention_mask=torch.LongTensor(eval_features.input_mask).unsqueeze(0).to(self.dev),
6972
token_type_ids=torch.LongTensor(eval_features.segment_ids).unsqueeze(0).to(self.dev))
70-
start_scores = model_output.start_logits
71-
end_scores = model_output.end_logits
73+
if self.version >= '4.0.0':
74+
start_scores = model_output.start_logits
75+
end_scores = model_output.end_logits
76+
else:
77+
start_scores, end_scores = model_output
7278
output = torch.stack([start_scores, end_scores], axis=-1).squeeze(0).cpu().numpy()
7379

7480
response_array = array.array("B", output.tobytes())

loadgen/loadgen.cc

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -183,22 +183,24 @@ auto SampleDistribution<TestMode::PerformanceOnly>(size_t sample_count,
183183
}
184184

185185
/// \brief SampleDistribution for 3D-UNet SingleStream, for v2.0
186-
// FIXME: meant for 3D UNet SingleStream only at the moment but the logic should work for others
187-
// TODO: consolidate the distribution generator after v2.0
188-
auto SampleDistributionEqualIssue(size_t sample_count, size_t set_size, std::mt19937* rng) {
186+
// FIXME: meant for 3D UNet SingleStream only at the moment but the logic should
187+
// work for others
188+
// TODO: consolidate the distribution generator after v2.0
189+
auto SampleDistributionEqualIssue(size_t sample_count, size_t set_size,
190+
std::mt19937* rng) {
189191
std::vector<size_t> indices;
190192
std::vector<size_t> shuffle_indices(set_size);
191193
std::iota(shuffle_indices.begin(), shuffle_indices.end(), 0);
192194
for (size_t j = 0; j < sample_count; j += set_size) {
193195
std::shuffle(shuffle_indices.begin(), shuffle_indices.end(), *rng);
194-
indices.insert(indices.end(), shuffle_indices.begin(), shuffle_indices.end());
196+
indices.insert(indices.end(), shuffle_indices.begin(),
197+
shuffle_indices.end());
195198
}
196199
return [indices = std::move(indices), i = size_t(0)](auto& /*gen*/) mutable {
197-
return indices.at((i++)%indices.size());
200+
return indices.at((i++) % indices.size());
198201
};
199202
}
200203

201-
202204
/// \brief Generates queries for the requested settings, templated by
203205
/// scenario and mode.
204206
/// \todo Make GenerateQueries faster.
@@ -262,10 +264,8 @@ std::vector<QueryMetadata> GenerateQueries(
262264

263265
// FIXME: Only used for v2.0 3D-UNet KiTS19 SingleStream
264266
// TODO: Need to consolidate the code for any generic usage after v2.0
265-
auto sample_distribution_equal_issue =
266-
SampleDistributionEqualIssue(min_queries,
267-
loaded_samples.size(),
268-
&sample_rng);
267+
auto sample_distribution_equal_issue = SampleDistributionEqualIssue(
268+
min_queries, loaded_samples.size(), &sample_rng);
269269

270270
auto schedule_distribution =
271271
ScheduleDistribution<scenario>(settings.target_qps);
@@ -340,12 +340,11 @@ std::vector<QueryMetadata> GenerateQueries(
340340
scenario == TestScenario::SingleStream;
341341
for (auto& s : samples) {
342342
s = loaded_samples[settings.performance_issue_unique
343-
? sample_distribution_unique(sample_rng)
344-
: settings.performance_issue_same
345-
? same_sample
346-
: equal_issue
347-
? sample_distribution_equal_issue(sample_rng)
348-
: sample_distribution(sample_rng)];
343+
? sample_distribution_unique(sample_rng)
344+
: settings.performance_issue_same ? same_sample
345+
: equal_issue
346+
? sample_distribution_equal_issue(sample_rng)
347+
: sample_distribution(sample_rng)];
349348
}
350349
}
351350
queries.emplace_back(samples, timestamp, response_delegate, sequence_gen);
@@ -653,7 +652,6 @@ void PerformanceSummary::ProcessLatencies() {
653652
// Calculate per-query stats.
654653
size_t query_count = pr.queries_issued;
655654
assert(pr.query_latencies.size() == query_count);
656-
assert(pr.query_intervals.size() == query_count);
657655
std::sort(pr.query_latencies.begin(), pr.query_latencies.end());
658656
QuerySampleLatency accumulated_query_latency = 0;
659657
for (auto latency : pr.query_latencies) {
@@ -1058,13 +1056,15 @@ void PerformanceSummary::LogDetail(AsyncDetail& detail) {
10581056
}
10591057
MLPERF_LOG(detail, "result_invalid_reason", recommendation);
10601058
}
1059+
std::replace(early_stopping_recommendation.begin(),
1060+
early_stopping_recommendation.end(), '\n', ' ');
10611061
MLPERF_LOG(detail, "early_stopping_result", early_stopping_recommendation);
10621062

10631063
// Report number of queries
1064-
MLPERF_LOG(detail, "result_query_count", std::to_string(query_count));
1064+
MLPERF_LOG(detail, "result_query_count", query_count);
10651065
if (settings.scenario == TestScenario::Server) {
10661066
MLPERF_LOG(detail, "result_overlatency_query_count",
1067-
std::to_string(overlatency_query_count));
1067+
overlatency_query_count);
10681068
}
10691069

10701070
auto reportPerQueryLatencies = [&]() {

loadgen/test_settings_internal.cc

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -120,9 +120,10 @@ TestSettingsInternal::TestSettingsInternal(
120120

121121
// Sample by concatentating several permutations of the dataset
122122
// sample_concatenate_permutation
123-
sample_concatenate_permutation = (requested.sample_concatenate_permutation == 0)
124-
? false
125-
: requested.sample_concatenate_permutation;
123+
sample_concatenate_permutation =
124+
(requested.sample_concatenate_permutation == 0)
125+
? false
126+
: requested.sample_concatenate_permutation;
126127

127128
// Samples per query.
128129
if (requested.scenario == TestScenario::MultiStream) {

loadgen/version_generator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ def generate_loadgen_version_definitions(cc_filename, loadgen_root):
9393
ofile.write("// DO NOT EDIT: Autogenerated by version_generator.py.\n\n")
9494
ofile.write("#include <string>\n\n")
9595
ofile.write("namespace mlperf {\n\n")
96-
ofile.write(func_def("Version", "\"1.1\""))
96+
ofile.write(func_def("Version", "\"2.0\""))
9797

9898
date_time_now_local = datetime.datetime.now().isoformat()
9999
date_time_now_utc = datetime.datetime.utcnow().isoformat()

speech_recognition/rnnt/accuracy_eval.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88

99
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "pytorch"))
1010

11-
from QSL import AudioQSL
1211
from helpers import process_evaluation_epoch, __gather_predictions
1312
from parts.manifest import Manifest
1413

@@ -31,8 +30,7 @@ def get_args():
3130
def main():
3231
args = get_args()
3332
labels = [" ", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]
34-
qsl = AudioQSL(args.dataset_dir, args.manifest, labels)
35-
manifest = qsl.manifest
33+
manifest = Manifest(args.dataset_dir, [args.manifest], labels, len(labels), normalize=True, max_duration=15.0)
3634
with open(os.path.join(args.log_dir, "mlperf_log_accuracy.json")) as fh:
3735
results = json.load(fh)
3836
hypotheses = []

0 commit comments

Comments
 (0)