Mask-rcnn training - all AP and Recall scores in “IoU Metric: segm” remain 0 #3811

hemasunder · 2021-05-11T12:09:41Z

With torchvision’s pre-trained mask-rcnn model, trying to train on a custom dataset prepared in COCO format.

Using torch/vision/detection/engine’s train_one_epoch and evaluate methods for training and evaluation, respectively.

The loss_mask metric is reducing as can be seen here:

Epoch: [5]  [ 0/20]  eta: 0:00:54  lr: 0.005000  loss: 0.5001 (0.5001)  loss_classifier: 0.2200 (0.2200)  loss_box_reg: 0.2616 (0.2616)  loss_mask: 0.0014 (0.0014)  loss_objectness: 0.0051 (0.0051)  loss_rpn_box_reg: 0.0120 (0.0120)  time: 2.7308  data: 1.2866  max mem: 9887
Epoch: [5]  [10/20]  eta: 0:00:26  lr: 0.005000  loss: 0.4734 (0.4982)  loss_classifier: 0.2055 (0.2208)  loss_box_reg: 0.2515 (0.2595)  loss_mask: 0.0012 (0.0013)  loss_objectness: 0.0038 (0.0054)  loss_rpn_box_reg: 0.0094 (0.0113)  time: 2.6218  data: 1.1780  max mem: 9887
Epoch: [5]  [19/20]  eta: 0:00:02  lr: 0.005000  loss: 0.5162 (0.5406)  loss_classifier: 0.2200 (0.2384)  loss_box_reg: 0.2616 (0.2820)  loss_mask: 0.0014 (0.0013)  loss_objectness: 0.0051 (0.0062)  loss_rpn_box_reg: 0.0120 (0.0127)  time: 2.6099  data: 1.1755  max mem: 9887

But the evaluate output shows absolutely no improvement from zero for IoU segm metric:

IoU metric: bbox

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.653
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.843
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.723
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.788
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.325
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.701
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.738
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.739
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.832
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.456
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

The segm metrics don’t improve even after training 500 epochs.

And, the masks that I get as output after training for 100 or 500 epochs, if I visualize, they are showing a couple of dots here and there.

With the same dataset and annotations json, I was able to train instance seg model on detectron2. the the segmentation IoU metrics have clearly improved by each epoch.

Please suggest, what needs to be done. Posting here as there was no response on discuss.pytorch forum for 5 days

cc @vfdev-5

The text was updated successfully, but these errors were encountered:

datumbox · 2021-05-11T12:18:21Z

@hemasunder It's very hard to help you debug a problem just by looking at a log. I think we can provide you with some hints on what to potentially check for. First of all it seems that have very little data at your disposal. You should be able to overfit it, provided you train the whole network end to end. The fact that you don't probably means that parts of your custom dataset is not in the format that the training scripts expect. So I would probably start by checking that your data are loaded correctly and their targets follow a similar format as COCO.

patches11 · 2023-03-02T19:34:03Z

I am having the same problem.

I believe the issue lies in either transforming the validation dataset, and coco_utils not understanding the transforms, or coco_utils not being able to translate the transformed masks back to segmentation that coco understands.

Did you ever resolve this issue?

datumbox added question topic: semantic segmentation labels May 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mask-rcnn training - all AP and Recall scores in “IoU Metric: segm” remain 0 #3811

Mask-rcnn training - all AP and Recall scores in “IoU Metric: segm” remain 0 #3811

hemasunder commented May 11, 2021 •

edited by pytorch-probot bot

Loading

datumbox commented May 11, 2021

patches11 commented Mar 2, 2023

Mask-rcnn training - all AP and Recall scores in “IoU Metric: segm” remain 0 #3811

Mask-rcnn training - all AP and Recall scores in “IoU Metric: segm” remain 0 #3811

Comments

hemasunder commented May 11, 2021 • edited by pytorch-probot bot Loading

datumbox commented May 11, 2021

patches11 commented Mar 2, 2023

hemasunder commented May 11, 2021 •

edited by pytorch-probot bot

Loading