There seems to be a regression on resnet-18 model inference time (When running on GPU) post this PR, this was caught in MMS. nightly runs, the changes in this PR seem to be causing this issue.
We use MMS docker images to run load tests, we can start a local container using the following command.
nvidia-docker run --name mms_benchmark_gpu -p 8080:8080 -p 8081:8081 -itd awsdeeplearningteam/mxnet-model-server:nightly-mxnet-gpu
for building MXNet opencv 3.2 and CUDA 9.2 were used.
Load testing was done using locust, to install locust
pip install locust
Download Test image
curl -O
The locust script for load testing
# test_resnet_!
from locust import HttpLocust, TaskSet, task, TaskSequence, seq_task
import urllib
import os
data = None
with open(os.path.join(os.getcwd(),'kitten.jpg'), 'rb') as data:
data =
class PredictionTasks(TaskSet):
def inference(self):"/predictions/resnet-18", data=data,headers={'Content-Type': 'image/jpeg'})
class Prediction(HttpLocust):
task_set = PredictionTasks
min_wait = 100
max_wait = 100
Running Load test
Registering and loading model
# Register and load resnet-18 model archive
curl -X POST
Start a single worker and run latency test
Start worker and latency test
$ curl -X PUT ''
$ locust -f Prediction --host= --no-web -c 1 -r 1 -t 20s --only-summary
To change mxnet version/build in docker image,
NOTE By default recent pip version is pulled.
# Go into docker image
nvidia-docker exec -u root -it mms_benchmark_gpu bash
$ pip uninstall mxnet-cu92mkl
$ pip install <new-build>.whl
ctrl + p + q to quit docker image
# Destroy existing worker, and create new worker, this loads in newly installed mxnet
$ curl -X PUT ''
$ curl -X PUT ''
on mxnet-cu92==1.3.0post0
# locust result
Name # reqs # fails Avg Min Max | Median req/s
POST /predictions/resnet-18 152 0(0.00%) 31 30 39 | 31 7.60
Total 152 0(0.00%) 7.60
Percentage of the requests completed within given times
Name # reqs 50% 66% 75% 80% 90% 95% 98% 99% 100%
POST /predictions/resnet-18 152 31 31 31 31 32 33 33 34 280
Total 152 31 31 31 31 32 33 33 34 280
On mxnet-cu92 with commit f9f7416
Name # reqs # fails Avg Min Max | Median req/s
POST /predictions/resnet-18 141 0(0.00%) 41 37 337 | 38 7.20
Total 141 0(0.00%) 7.20
Percentage of the requests completed within given times
Name # reqs 50% 66% 75% 80% 90% 95% 98% 99% 100%
POST /predictions/resnet-18 141 38 39 39 40 40 42 49 49 340
Total 141 38 39 39 40 40 42 49 49 340
This regression thus carries over to 1.3.1
There is a 30% increase in latency/inference time for resnet-18 based on the above results.