Skip to content

Latest commit

 

History

History
291 lines (213 loc) · 11.6 KB

get_started.md

File metadata and controls

291 lines (213 loc) · 11.6 KB

Getting Started

This page provides basic tutorials about the usage of MMTracking. For installation instructions, please see install.md.

Prepare datasets

  1. It is recommended to symlink the root of the datasets to $MMTRACKING/data. For example,

    mkdir data
    
    # object detection: symlink MS COCO
    ln -s $MSCOCO_ROOT/images data/coco/source_data
    ln -s $MSCOCO_ROOT/annotations data/coco/json_annotations
    
    # video object detection: symlink ImageNet DET and ImageNet VID
    ln -s $IMAGENETDET_IMAGENETVID_ROOT data/imagenetdet_imagenetvid/source_data
    
    # single object tracking: symlink LaSOT
    ln -s $LASOT_ROOT data/lasot/source_data
    
    # multiple object tracking: symlink MOT17
    ln -s $MOT17 data/MOT17
    

    Download the txt files for the training of video object detection, and put these txt files into data/imagenetdet_imagenetvid/data/Lists/.

  2. If your folder structure is different from the following, you may need to change the corresponding paths in config files.

    mmtracking
    ├── mmtrack
    ├── tools
    ├── configs
    ├── data
    │   ├── coco
    │   │   ├── source_data
    │   │   │   ├── train2017
    │   │   ├── json_annotations
    │   │   │   ├── instances_train2017.json
    │   ├── imagenetdet_imagenetvid
    │   │   ├── source_data
    │   │   │   ├── Data
    │   │   │   │   ├── DET
    │   │   │   │   ├── VID
    │   │   │   ├── Annotations
    │   │   │   │   ├── DET
    │   │   │   │   ├── VID
    │   │   ├── json_annotations
    │   │   │   ├── imagenet_det_30plus1cls.json (generated by tools/convert_datasets/imagenet2coco_det.py)
    │   │   │   ├── imagenet_vid_train.json (generated by tools/convert_datasets/imagenet2coco_vid.py)
    │   │   │   ├── imagenet_vid_val.json (generated by tools/convert_datasets/imagenet2coco_vid.py)
    │   ├── lasot
    │   │   ├── source_data
    │   │   │   ├── airplane-1
    │   │   │   ├── airplane-13
    │   │   ├── json_annotations
    │   │   │   ├── lasot_test.json (generated by tools/convert_datasets/lasot2coco.py)
    |   ├── MOT17
    |   |   ├── train
    |   |   ├── test
    
  3. Generate the json annotations of MS COCO dataset, LaSOT dataset, ImageNet DET and ImageNet VID dataset.

    # Generate imagenet_det_30plus1cls.json
    python ./tools/convert_datasets/imagenet2coco_det.py \
        -i ./data/imagenetdet_imagenetvid/source_data \
        -o ./data/imagenetdet_imagenetvid/json_annotations
    
    # Generate imagenet_vid_train.json and imagenet_vid_val.json
    python ./tools/convert_datasets/imagenet2coco_vid.py \
        -i ./data/imagenetdet_imagenetvid/source_data \
        -o ./data/imagenetdet_imagenetvid/json_annotations
    
    # Generate lasot_test.json
    python ./tools/convert_datasets/lasot2coco.py \
        -i ./data/lasot/source_data \
        -o ./data/lasot/json_annotations
    
    # Generate annotations files for MOT17
    python ./tools/convert_datasets/mot2coco.py \
        -i ./data/MOT17/ \
        -o ./data/MOT17/annotations \
        --split-train --convert-det
    

Inference with pretrained models

We provide testing scripts to evaluate a whole dataset, and also some high-level apis for easier integration to other projects.

Test a dataset

  • single GPU
  • single node multiple GPU

You can use the following commands to test a dataset.

# single-gpu testing
python tools/test.py ${CONFIG_FILE} [--checkpoint ${CHECKPOINT_FILE}] [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${GPU_NUM} [--checkpoint ${CHECKPOINT_FILE}] [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

Optional arguments:

  • CHECKPOINT_FILE: Filename of the checkpoint. You do not need to define it when applying MOT tasks but specify the checkpoints in the config.
  • RESULT_FILE: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
  • EVAL_METRICS: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., bbox is available for ImageNet VID, track is available for LaSOT and MOT17.

Examples:

Assume that you have already downloaded the checkpoints to the directory checkpoints/.

  1. Test DFF on ImageNet VID, and evaluate the bbox mAP.

    python tools/test.py configs/vid/dff/dff_faster_rcnn_r101_dc5_1x_imagenetvid.py \
        --checkpoint checkpoints/dff_faster_rcnn_r101_dc5_1x_imagenetvid_20201218_172720-ad732e17.pth \
        --out results.pkl \
        --eval bbox
  2. Test DFF with 8 GPUs, and evaluate the bbox mAP.

    ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/vid/dff/dff_faster_rcnn_r101_dc5_1x_imagenetvid.py \
        --checkpoint checkpoints/dff_faster_rcnn_r101_dc5_1x_imagenetvid_20201218_172720-ad732e17.pth \
        --out results.pkl \
        --eval bbox
  3. Test SiameseRPN++ on LaSOT, and evaluate the success and normed precision.

    python tools/test.py configs/sot/siamese_rpn/siamese_rpn_r50_1x_lasot.py \
        --checkpoint checkpoints/siamese_rpn_r50_1x_lasot_20201218_051019-3c522eff.pth \
        --out results.pkl \
        --eval bbox
  4. Test SiameseRPN++ with 8 GPUs, and evaluate the success and normed precision.

    ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/sot/siamese_rpn/siamese_rpn_r50_1x_lasot.py \
        --checkpoint checkpoints/siamese_rpn_r50_1x_lasot_20201218_051019-3c522eff.pth \
        --out results.pkl \
        --eval bbox
  5. Test Tracktor on MOT17, and evaluate CLEAR MOT metrics.

    python tools/test.py configs/mot/tracktor/tracktor_faster-rcnn_r50_fpn_4e_mot17-public-half.py \
        --eval track
  6. Test Tracktor with 8 GPUs, and evaluate CLEAR MOT metrics.

    ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} \
        configs/mot/tracktor/tracktor_faster-rcnn_r50_fpn_4e_mot17-public-half.py \
        --eval track

Train a model

MMDetection implements distributed training and non-distributed training, which uses MMDistributedDataParallel and MMDataParallel respectively.

All outputs (log files and checkpoints) will be saved to the working directory, which is specified by work_dir in the config file.

By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by adding the interval argument in the training config.

evaluation = dict(interval=12)  # This evaluate the model per 12 epoch.

Important: The default learning rate in config files is for 8 GPUs. According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 8 GPUs * 1 img/gpu and lr=0.04 for 16 GPUs * 2 img/gpu.

Train with a single GPU

python tools/train.py ${CONFIG_FILE} [optional arguments]

If you want to specify the working directory in the command, you can add an argument --work_dir ${YOUR_WORK_DIR}.

Train with multiple GPUs

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

Optional arguments are:

  • --no-validate (not suggested): By default, the codebase will perform evaluation at every k (default value is 1, which can be modified like this) epochs during the training. To disable this behavior, use --no-validate.
  • --work-dir ${WORK_DIR}: Override the working directory specified in the config file.
  • --resume-from ${CHECKPOINT_FILE}: Resume from a previous checkpoint file.
  • --options 'Key=value': Overide some settings in the used config.

Difference between resume-from and load-from: resume-from loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally. load-from only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.

Train with multiple machines

If you run MMTracking on a cluster managed with slurm, you can use the script slurm_train.sh. (This script also supports single machine training.)

[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}

Here is an example of using 16 GPUs to train DFF on the dev partition.

GPUS=16 ./tools/slurm_train.sh dev dff_r101_1x configs/dff_faster_rcnn_r101_dc5_1x_imagenetvid.py /nfs/xxxx/dff_faster_rcnn_r101_dc5_1x_imagenetvid

You can check slurm_train.sh for full arguments and environment variables.

If you have just multiple machines connected with ethernet, you can refer to PyTorch launch utility. Usually it is slow if you do not have high speed networking like InfiniBand.

Launch multiple jobs on a single machine

If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.

If you use dist_train.sh to launch training jobs, you can set the port in commands.

CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4

If you use launch training jobs with Slurm, there are two ways to specify the ports.

  1. Set the port through --options. This is more recommended since it does not change the original configs.

    CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --options 'dist_params.port=29500'
    CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --options 'dist_params.port=29501'
  2. Modify the config files (usually the 6th line from the bottom in config files) to set different communication ports.

    In config1.py,

    dist_params = dict(backend='nccl', port=29500)

    In config2.py,

    dist_params = dict(backend='nccl', port=29501)

    Then you can launch two jobs with config1.py ang config2.py.

    CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
    CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}

Useful tools

We provide lots of useful tools under tools/ directory.

Publish a model

Before you upload a model to AWS, you may want to (1) convert model weights to CPU tensors, (2) delete the optimizer states and (3) compute the hash of the checkpoint file and append the hash id to the filename.

python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}

E.g.,

python tools/publish_model.py work_dirs/dff_faster_rcnn_r101_dc5_1x_imagenetvid/latest.pth dff_faster_rcnn_r101_dc5_1x_imagenetvid_20201230.pth

The final output filename will be dff_faster_rcnn_r101_dc5_1x_imagenetvid_20201230-{hash id}.pth.