Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mask-rcnn training - all AP and Recall scores in “IoU Metric: segm” remain 0 #3811

Open
hemasunder opened this issue May 11, 2021 · 2 comments

Comments

@hemasunder
Copy link

hemasunder commented May 11, 2021

With torchvision’s pre-trained mask-rcnn model, trying to train on a custom dataset prepared in COCO format.

Using torch/vision/detection/engine’s train_one_epoch and evaluate methods for training and evaluation, respectively.

The loss_mask metric is reducing as can be seen here:

Epoch: [5]  [ 0/20]  eta: 0:00:54  lr: 0.005000  loss: 0.5001 (0.5001)  loss_classifier: 0.2200 (0.2200)  loss_box_reg: 0.2616 (0.2616)  loss_mask: 0.0014 (0.0014)  loss_objectness: 0.0051 (0.0051)  loss_rpn_box_reg: 0.0120 (0.0120)  time: 2.7308  data: 1.2866  max mem: 9887
Epoch: [5]  [10/20]  eta: 0:00:26  lr: 0.005000  loss: 0.4734 (0.4982)  loss_classifier: 0.2055 (0.2208)  loss_box_reg: 0.2515 (0.2595)  loss_mask: 0.0012 (0.0013)  loss_objectness: 0.0038 (0.0054)  loss_rpn_box_reg: 0.0094 (0.0113)  time: 2.6218  data: 1.1780  max mem: 9887
Epoch: [5]  [19/20]  eta: 0:00:02  lr: 0.005000  loss: 0.5162 (0.5406)  loss_classifier: 0.2200 (0.2384)  loss_box_reg: 0.2616 (0.2820)  loss_mask: 0.0014 (0.0013)  loss_objectness: 0.0051 (0.0062)  loss_rpn_box_reg: 0.0120 (0.0127)  time: 2.6099  data: 1.1755  max mem: 9887

But the evaluate output shows absolutely no improvement from zero for IoU segm metric:

IoU metric: bbox

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.653
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.843
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.723
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.788
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.325
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.701
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.738
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.739
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.832
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.456
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

The segm metrics don’t improve even after training 500 epochs.

And, the masks that I get as output after training for 100 or 500 epochs, if I visualize, they are showing a couple of dots here and there.

With the same dataset and annotations json, I was able to train instance seg model on detectron2. the the segmentation IoU metrics have clearly improved by each epoch.

Please suggest, what needs to be done. Posting here as there was no response on discuss.pytorch forum for 5 days

cc @vfdev-5

@datumbox
Copy link
Contributor

@hemasunder It's very hard to help you debug a problem just by looking at a log. I think we can provide you with some hints on what to potentially check for. First of all it seems that have very little data at your disposal. You should be able to overfit it, provided you train the whole network end to end. The fact that you don't probably means that parts of your custom dataset is not in the format that the training scripts expect. So I would probably start by checking that your data are loaded correctly and their targets follow a similar format as COCO.

@patches11
Copy link

I am having the same problem.

I believe the issue lies in either transforming the validation dataset, and coco_utils not understanding the transforms, or coco_utils not being able to translate the transformed masks back to segmentation that coco understands.

Did you ever resolve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants