Here is an article talking a little bit about this project. Feel free to skip; the interesting stuff is all here.
First of all, configure AWS' Command Line Interface locally so you can access the AWS account. Here is more information about configuring the CLI.
Now, create a repository in AWS ECR to host the docker image. You can either do this using the web interface or the command line
as follows. Make sure you specify a name for your repository in place of the <repository-name>
aws ecr create-repository --repository-name <repository-name>
If you don't have your AWS account identifier around, you can run the following command to get it. You are going to need the account number to build and push the docker image to ECR:
aws sts get-caller-identity --query Account
To upload your docker image to ECR, you need to login first from your terminal. The following command gets your session authenticated (make sure you replace <account-id>
with your account identifier from the previous step):
$(aws ecr get-login --no-include-email --registry-id <account-id>)
Use the following command to build the docker image:
docker build \
-t <account-id><repository-name>:1.15.0-gpu \
--build-arg ARCHITECTURE=1.15.0-gpu .
Notice that you have to replace <account-id>
with your AWS account identifier and <repository-name>
with the name of the
repository that you created before. Also, the ARCHITECTURE
build argument is used to specify the tag of the base image you
want to use to construct your container. For example:
- To build an image from the latest version of TensorFlow with GPU support, set
. - To build an image from the latest version of TensorFlow with CPU support, set
After the image finishes building, you can push it up to ECR. The image is quite large so you can expect the operation to take a few minutes depending on the speed of your connection:
docker push <account-id><repository-name>:1.15.0-gpu
Now that your docker image is out of the way, you have to create and upload all the relevant resources to S3 so SageMaker can make them available to your image. Assuming you want to train your model on SageMaker (instead of just serve a model that was already trained) here is what you need to upload:
TFRecord files with your training and validation data. You can read more about TFRecords here. These files contain your training and validation images plus annotations.
The pipeline configuration file. This file contains the configuration of the specific algorithm that you use for training. A template of the appropriate file comes with each one of the pre-trained models offered on this repository.
The pre-trained model checkpoint. If you are planning to use Transfer Learning to kickstart your training, you need to provide the pre-trained model checkpoint so the algorithm can start from there.
The list of labels you are trying to detect. This is usually provided as a
file. Here is an example illustrating the format of this file.
To upload all of these files, create a bucket in S3 and follow these instructions:
The TFRecord files with your training and validation data could be named however you like. You could also upload multiple TFRecord files (following TensorFlow's convention of naming these files as yourfile.record-00000-00010, and so on and so forth.)
The pipeline configuration file should be uploaded with the name
. If you wish to change this name, you'll have to modify it in thetrain
file that's part of the docker image. Before uploading this file to S3, make sure you change it accordingly to prepare your training and validation job. -
Upload the entire pre-trained model that you got from the model zoo page. Unzip it first, and upload the entire folder. It can be named however you'd like.
The label map file should be uploaded as
. If you wish to change this name, you'll have to modify it in thetrain
files that are part of the docker image. But don't do that unless you want to start messing with stuff.
At the end, the content of your S3 bucket should look something like this:
- faster_rcnn_resnet50_coco_2018_01_28/
- training.record
- validation.record
- pipeline.config
- label_map.pbtxt
To train a model using the image you just uploaded, you need to create a training job using SageMaker's web interface. Here are the relevant sections of the configuration of the training job that you need to update:
Algorithm source: Here you select "Your own algorithm container in ECR". In the container path field you specify your image
. (Make sure you replace the <account-id>
and <repository-name>
placeholders. Also, make sure you specify the proper version of your image.)
Resource configuration: Select an appropriate instance type for your training needs (make sure it's Accelerated Computing so it enables GPU training,) and enough additional storage volume to your instance. Failing to supply enough storage space will cause the training job to fail because it won't be able to download the necessary files.
Hyperparameters: The following list of hyperparameters are supported by the training script embedded in your docker image:
: The number of training steps that you want the Object Detection algorithm to use to train your model. If not specified, this value defaults to100
. -
: Whether you want to also generate a TensorFlow Lite (TFLite) model that can be run on mobile devices. If not specified, this value defaults toFalse
. -
: The size that images will be resized to if we generating a TFLite model (both width and height will be set to this size.) This parameter is ignored ifquantize
. If not specified, this value defaults to300
. -
: The type of inference that will be used if we are generating a TFLite model. This should be one ofQUANTIZED_UINT8
values. This parameter is ignored ifquantize
. If not specified, this value defaults toFLOAT
Input data configuration: We want to create a couple of channels under this section to allow SageMaker to expose the necessary resources to our docker image (it does so by "mounting" a volume in our docker image so we can access the files directly from there):
: This channel exposes our data and configuration files to our docker image. Make sure to set the channel name property totraining
, the input mode toFile
, the data source toS3
, and the S3 location to the S3 bucket we created before. -
: The second channel exposes our pre-trained network to our docker image. Set the channel name property tocheckpoint
, the input mode toFile
, the data source toS3
, and the S3 location to the S3 folder that contains our pre-trainer network files (this would be pointing to thefaster_rcnn_resnet50_coco_2018_01_28/
folder in our example above.)
Output data configuration: When the model finishes training, SageMaker will upload the results to this location. Set this field to the S3 location where you want to store the output of the training process.
At this point, and unless you want to tweak any of the other settings on the training job screen, you can create your training job. SageMaker will initiate the training process and will inform you when it finishes.
At this point, your training job finished and your model is ready. To get it up and running on SageMaker, create a new Notebook instance using SageMaker's web interface and open Jupyter as soon as it's ready. On that notebook, you can add the following code in a cell:
import boto3
from sagemaker import get_execution_role
from import get_image_uri
role = get_execution_role()
sagemaker = boto3.Session().client(service_name='sagemaker')
# Before anything else, we need to set these variables appropriately.
# TRAINING_JOB_NAME should be the name of the training job that we
# used to train the model.
# DOCKER_IMAGE should point to the docker image that trains and serves
# the model.
model_name = TRAINING_JOB_NAME
training_job = sagemaker.describe_training_job(TrainingJobName=TRAINING_JOB_NAME)
# Let's now create the model definition in SageMaker.
response = sagemaker.create_model(
ModelName = model_name,
ExecutionRoleArn = role,
PrimaryContainer = {
'ModelDataUrl': training_job['ModelArtifacts']['S3ModelArtifacts'],
print('Model:', response['ModelArn'])
# Now, we need to create an Endpoint configuration. Make sure to select
# an appropriate instance type to run inference on the model.
endpoint_configuration_name = TRAINING_JOB_NAME + '-endpoint-configuration'
response = sagemaker.create_endpoint_config(
EndpointConfigName = endpoint_configuration_name,
'ModelName': model_name,
print('Endpoint configuration:', response['EndpointConfigArn'])
# Finally, we can create the Endpoint that will serve our model. This is going to
# provision the appropriate instance and deploy our docker image so it can start
# serving. This will take a while.
endpoint_name = TRAINING_JOB_NAME + '-endpoint'
response = sagemaker.create_endpoint(
print('Creating endpoint', response['EndpointArn'])
status = 'Creating'
while status == 'Creating':
# Let's wait until the status of the endpoint changes
response = sagemaker.describe_endpoint(EndpointName=endpoint_name)
status = response['EndpointStatus']
print('Endpoint status:', status)
if status != 'InService':
print('Endpoint creation failed')
It will take some time to create and provision the instance running the docker image, but when is done, you should be ready to test it with the code below.
First, let's define a simple function to visualize the predictions (detections) made by the model:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
def visualize_predictions(image_file, predictions, classes=[], threshold=0.6):
image = mpimg.imread(image_file)
plt.figure(figsize = (20,20))
height = image.shape[0]
width = image.shape[1]
for prediction in predictions['prediction']:
(class_id, confidence, x0, y0, x1, y1) = prediction
if confidence >= threshold:
# The model uses base 1 to represent the different classes. It's
# weird, I know, so let's take it to base 0.
class_id = int(class_id) - 1
xmin = int(x0 * width)
ymin = int(y0 * height)
xmax = int(x1 * width)
ymax = int(y1 * height)
rectangle = plt.Rectangle(
(xmin, ymin),
xmax - xmin,
ymax - ymin,
class_name = classes[class_id]
ymin - 2,
'{:s} {:.3f}'.format(class_name, confidence),
bbox=dict(facecolor='r', alpha=0.5), fontsize=12, color='white')
Then, we can download a sample image, and invoke the model endpoint to obtain the list of predictions back (make sure you replace <IMAGE_URL>
the URL of your favorite cat picture or whatever the model is ready to infer):
import json
import base64
# Let's download a sample image from the web and save it locally as test.jpg.
!wget -O test.jpg <IMAGE_URL>
image_file = 'test.jpg'
with open(image_file, "rb") as image:
encoded_string = base64.b64encode(
# Now we can invoke the endpoint providing the image as a base64 string.
sagemaker-runtime = boto3.client('sagemaker-runtime')
response = sagemaker-runtime.invoke_endpoint(
"image": encoded_string.decode('utf-8')
# If everything works as expected, the response will contain a Body property
# containing the list of predictions inside.
predictions = json.loads(response['Body'].read().decode("utf-8"))
classes=['Class 1', 'Class 2'],
You can run inference using your trained model locally by running the docker image on your computer. This is also useful when you want to deploy your model on-premises and don't want to rely on AWS to use it.
To do this, first you'll need to download your trained model from S3. SageMaker saved your trained model in
the S3 location that you specified when configuring your training job. Inside the folder that you specified, there's
a model.tar.gz
file that you need to download and untar locally. Mount this folder to the docker image
so it can use your model to run inference on images.
You can run the docker container that you built before, or you can build a new one using a different tag:
docker build \
-t tensorflow-object-detection:1.15.0-cpu \
--build-arg ARCHITECTURE=1.15.0 .
Starting the docker image with the serve
command starts a gunicorn
server listening for any HTTP
requests to the /invocations
location. This server listens on port 8080
, so we need to make sure
to map that port to a local port.
Finally, when running locally, you can specify the timeout (MODEL_SERVER_TIMEOUT
) and the number of workers (MODEL_SERVER_WORKERS
that gunicorn
uses through environment variables.
Here is an example command that runs the docker container and makes it listen to port 8080 locally:
docker run \
-p 8080:8080 \
-v <local-path-to-model-folder>:/opt/ml/model \
--name "tensorflow-object-detection" \
tensorflow-object-detection:1.15.0-cpu serve
After having the docker container running, you can use the following script to run inference on an image (the script depends
on the requests
library that you can install running pip install requests
import base64
import requests
if __name__ == '__main__':
image_file = <LOCAL_IMAGE_FILE>
with open(image_file, "rb") as image:
encoded_string = base64.b64encode(
body = {
"image": encoded_string.decode('utf-8')
response ='', json=body)