This document has instructions for running SSD-ResNet34 FP32 training using Intel-optimized TensorFlow.
SSD-ResNet34 training uses the COCO training dataset. Use the instructions to download and preprocess the dataset.
Script name | Description |
---|---|
fp32_training.sh |
Runs 100 training steps using mpirun for the specified number of processes (defaults to MPI_NUM_PROCESSES=1). |
Setup your environment using the instructions below, depending on if you are using AI Kit:
Setup using AI Kit | Setup without AI Kit |
---|---|
To run using AI Kit you will need:
|
To run without AI Kit you will need:
|
For more information on the dependencies, see the installation instructions for object detection models at the TensorFlow Model Garden repository.
Running SSD-ResNet34 training uses code from the
TensorFlow Model Garden.
Clone the repo at the commit specified below, and set the TF_MODELS_DIR
environment
variable to point to that directory.
# Clone the tensorflow/models repo at the specified commit.
# Please note that required commit for this section is different from the one used for dataset preparation.
git clone https://github.com/tensorflow/models.git tf_models
cd tf_models
export TF_MODELS_DIR=$(pwd)
git checkout 8110bb64ca63c48d0caee9d565e5b4274db2220a
cd ..
Set the DATASET_DIR
to point to the directory with COCO training TF records
files and the OUTPUT_DIR
to the location where log and checkpoint files will
be written. Use an empty output directory to prevent checkpoint file conflicts
from previous runs. You can optionally set the MPI_NUM_PROCESSES
environment
variable (defaults to 1). After all the setup is complete, run the
quickstart script.
# cd to your model zoo directory
cd models
export TF_MODELS_DIR=<path to your clone of the TensorFlow models repo>
export DATASET_DIR=<path to the dataset>
export OUTPUT_DIR=<directory where log and checkpoint files will be written>
export MPI_NUM_PROCESSES=<number of MPI processes (optional, defaults to 1)>
./quickstart/object_detection/tensorflow/ssd-resnet34/training/cpu/fp32/fp32_training.sh
- To run more advanced use cases, see the instructions here
for calling the
launch_benchmark.py
script directly. - To run the model using docker, please see the oneContainer
workload container:
https://software.intel.com/content/www/us/en/develop/articles/containers/ssd-resnet34-fp32-training-tensorflow-container.html.