Skip to content

Commit d8c8ea2

Browse files
author
Adam Wawrzyński
committed
Fix some of the errors I've encountered while following tutorials
Signed-off-by: Adam Wawrzyński <adam.wawrzynski@reasonfieldlab.com>
1 parent 2d90a3a commit d8c8ea2

File tree

12 files changed

+20
-29
lines changed

12 files changed

+20
-29
lines changed

Conceptual_Guide/Part_1-model_deployment/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -181,21 +181,21 @@ input [
181181
{
182182
name: "input_images:0"
183183
data_type: TYPE_FP32
184-
dims: [ -1, -1, -1, 3 ]
184+
dims: [ -1, -1, 3 ]
185185
}
186186
]
187187
output [
188188
{
189189
name: "feature_fusion/Conv_7/Sigmoid:0"
190190
data_type: TYPE_FP32
191-
dims: [ -1, -1, -1, 1 ]
191+
dims: [ -1, -1, 1 ]
192192
}
193193
]
194194
output [
195195
{
196196
name: "feature_fusion/concat_3:0"
197197
data_type: TYPE_FP32
198-
dims: [ -1, -1, -1, 5 ]
198+
dims: [ -1, -1, 5 ]
199199
}
200200
]
201201
```

Conceptual_Guide/Part_1-model_deployment/client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,6 @@ def recognition_postprocessing(scores: np.ndarray) -> str:
207207
)
208208

209209
# Process response from recognition model
210-
final_text = recognition_postprocessing(recognition_response.as_numpy("308"))
210+
final_text = recognition_postprocessing(recognition_response.as_numpy("307"))
211211

212212
print(final_text)

Conceptual_Guide/Part_1-model_deployment/model_repository/text_detection/config.pbtxt

100644100755
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,20 +31,20 @@ input [
3131
{
3232
name: "input_images:0"
3333
data_type: TYPE_FP32
34-
dims: [ -1, -1, -1, 3 ]
34+
dims: [ -1, -1, 3 ]
3535
}
3636
]
3737
output [
3838
{
3939
name: "feature_fusion/Conv_7/Sigmoid:0"
4040
data_type: TYPE_FP32
41-
dims: [ -1, -1, -1, 1 ]
41+
dims: [ -1, -1, 1 ]
4242
}
4343
]
4444
output [
4545
{
4646
name: "feature_fusion/concat_3:0"
4747
data_type: TYPE_FP32
48-
dims: [ -1, -1, -1, 5 ]
48+
dims: [ -1, -1, 5 ]
4949
}
5050
]

Conceptual_Guide/Part_1-model_deployment/model_repository/text_recognition/config.pbtxt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ input [
3636
]
3737
output [
3838
{
39-
name: "308"
39+
name: "307"
4040
data_type: TYPE_FP32
4141
dims: [ 1, 26, 37 ]
4242
}

Conceptual_Guide/Part_2-improving_resource_utilization/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ model.load_state_dict(state)
110110
111111
# Create ONNX file by tracing model
112112
trace_input = torch.randn(1, 1, 32, 100)
113-
torch.onnx.export(model, trace_input, "str.onnx", verbose=True, dynamic_axes={'input.1':[0],'308':[0]})
113+
torch.onnx.export(model, trace_input, "str.onnx", verbose=True, dynamic_axes={'input.1':[0],'307':[0]})
114114
```
115115

116116
### Launching the server
@@ -231,7 +231,7 @@ Request concurrency: 16
231231
```
232232
As each of the requests had a batch size (of 2), while the maximum batch size of the model was 8, dynamically batching these requests resulted in considerably improved throughput. Another consequence is a reduction in the latency. This reduction can be primarily attributed to reduced wait time in queue wait time. As the requests are batched together, multiple requests can be processed in parallel.
233233

234-
* **Dynamic Batching with multiple model instances**: To set up the Triton Server in this configuration, add `instance_group` in `config.pbtxt` and make sure to include `--gpus=1` and make sure to include `--gpus=1` in the `docker run` command to set up the server. Include `dynamic_batching` per instructions of the previous section in the model configuration. A point to note is that peak GPU utilization on the GPU shot up to 74% (A100 in this case) while just using a single model instance with dynamic batching. Adding one more instance will definitely improve performance but linear perf scaling will not be achieved in this case.
234+
* **Dynamic Batching with multiple model instances**: To set up the Triton Server in this configuration, add `instance_group` in `config.pbtxt` and make sure to include `--gpus=1` in the `docker run` command to set up the server. Include `dynamic_batching` per instructions of the previous section in the model configuration. A point to note is that peak GPU utilization on the GPU shot up to 74% (A100 in this case) while just using a single model instance with dynamic batching. Adding one more instance will definitely improve performance but linear perf scaling will not be achieved in this case.
235235

236236
```
237237
# Query

Conceptual_Guide/Part_2-improving_resource_utilization/model_repository/text_recognition/config.pbtxt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ input [
3636
]
3737
output [
3838
{
39-
name: "308"
39+
name: "307"
4040
data_type: TYPE_FP32
4141
dims: [ 26, 37 ]
4242
}

Conceptual_Guide/Part_5-Model_Ensembles/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -316,10 +316,10 @@ ensemble_scheduling {
316316
We'll again be launching Triton using docker containers. This time, we'll start an interactive session within the container instead of directly launching the triton server.
317317

318318
```bash
319-
docker run --gpus=all -it --shm-size=256m --rm \
319+
docker run --gpus=all -it --shm-size=512m --rm \
320320
-p8000:8000 -p8001:8001 -p8002:8002 \
321321
-v ${PWD}:/workspace/ -v ${PWD}/model_repository:/models \
322-
nvcr.io/nvidia/tritonserver:22.12-py3
322+
nvcr.io/nvidia/tritonserver:yy.mm-py3
323323
```
324324

325325
We'll need to install a couple of dependencies for our Python backend scripts.

Conceptual_Guide/Part_5-Model_Ensembles/model_repository/ensemble_model/config.pbtxt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ ensemble_scheduling {
100100
value: "cropped_images"
101101
}
102102
output_map {
103-
key: "308"
103+
key: "307"
104104
value: "recognition_output"
105105
}
106106
},

Conceptual_Guide/Part_5-Model_Ensembles/model_repository/text_recognition/config.pbtxt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ input [
3636
]
3737
output [
3838
{
39-
name: "308"
39+
name: "307"
4040
data_type: TYPE_FP32
4141
dims: [ 26, 37 ]
4242
}

Conceptual_Guide/Part_5-Model_Ensembles/utils/export_text_recognition.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,5 +45,5 @@
4545
trace_input,
4646
model_directory / "model.onnx",
4747
verbose=True,
48-
dynamic_axes={"input.1": [0], "308": [0]},
48+
dynamic_axes={"input.1": [0], "307": [0]},
4949
)

HuggingFace/ensemble_model_repository/ensemble_model/config.pbtxt

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,6 @@ output [
4040
name: "last_hidden_state"
4141
data_type: TYPE_FP32
4242
dims: [-1, -1]
43-
},
44-
{
45-
name: "1519"
46-
data_type: TYPE_FP32
47-
dims: [768]
4843
}
4944
]
5045
ensemble_scheduling {
@@ -72,10 +67,6 @@ ensemble_scheduling {
7267
key: "last_hidden_state"
7368
value: "last_hidden_state"
7469
}
75-
output_map {
76-
key: "1519"
77-
value: "1519"
78-
}
7970
}
8071
]
8172
}

HuggingFace/python_model_repository/python_vit/1/model.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,22 +30,22 @@
3030

3131
class TritonPythonModel:
3232
def initialize(self, args):
33-
self.feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k').to("cuda")
33+
self.feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')#.to("cuda")
3434
self.model = ViTModel.from_pretrained("google/vit-base-patch16-224-in21k").to("cuda")
3535

3636
def execute(self, requests):
3737
responses = []
3838
for request in requests:
3939
inp = pb_utils.get_input_tensor_by_name(request, "image")
4040
input_image = np.squeeze(inp.as_numpy()).transpose((2,0,1))
41-
inputs = self.feature_extractor(images=input_image, return_tensors="pt")
41+
inputs = self.feature_extractor(images=input_image, return_tensors="pt").to("cuda")
4242

4343
outputs = self.model(**inputs)
4444

4545
inference_response = pb_utils.InferenceResponse(output_tensors=[
4646
pb_utils.Tensor(
47-
"label",
48-
outputs.last_hidden_state.numpy()
47+
"last_hidden_state",
48+
outputs.last_hidden_state.detach().cpu().numpy()
4949
)
5050
])
5151
responses.append(inference_response)

0 commit comments

Comments
 (0)