Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add GN and GIoU loss for retinanet #4932

Closed
xiaohu2015 opened this issue Nov 14, 2021 · 15 comments
Closed

add GN and GIoU loss for retinanet #4932

xiaohu2015 opened this issue Nov 14, 2021 · 15 comments

Comments

@xiaohu2015
Copy link
Contributor

xiaohu2015 commented Nov 14, 2021

🚀 The feature

  • support GroupNorm on retianet head;
  • support GIoU loss for regression loss;

Motivation, pitch

As we know, retinanet should get better mAP if using GN on retianet head and GIoU loss

Alternatives

No response

Additional context

No response

cc @datumbox

@oke-aditya
Copy link
Contributor

Hi @xiaohu2015 Thanks a lot for the proposal.

Can you please clarify what GN is ?

gIoU is currently supported as torchvision.ops.generalized_box_iou (this name was kept for DeTR compatibility)
and I guess creating loss function from the op is straightforward?
I think it is 1 - ops.generalized_box_iou ?

For loss functions there is an RFC #2980 and I guess gIoU loss is proposed there.

@xiaohu2015
Copy link
Contributor Author

@oke-aditya GroupNorm

@xiaohu2015
Copy link
Contributor Author

@oke-aditya I think the pr is more about performance enhancement for retiannet model.

@oke-aditya
Copy link
Contributor

Yes I think it would be great to improve the model. Do you have some statistics of the new mAP possible with GroupNorm and gIoU? I'm asking to so that I could give a read 😅

@xiaohu2015
Copy link
Contributor Author

@oke-aditya I think the mAP can increase about 1~2 point if use GN and GIoU loss.

@xiaohu2015
Copy link
Contributor Author

@oke-aditya I want to add a new model retinanet_gn_resnet50_fpn which uses GN on head.

@datumbox
Copy link
Contributor

@xiaohu2015 Thanks for the proposal.

What other changes do you propose to make to the standard retinanet+resnet50 model builder other than passing GroupNorm? If the only thing needed is to pass GN, I'm not sure it's worth adding another model. This is something that users of TorchVision can do easily on their own. Let me know what you think, thanks!

@xiaohu2015
Copy link
Contributor Author

xiaohu2015 commented Nov 15, 2021

@datumbox I want to make the follow changes:

  • add a argument norm_layer for RetinaNetClassificationHead and RetinaNetRegressionHead

so the user can use RetinaNetClassificationHead and RetinaNetRegressionHead to build GN head:

    norm_layer = lambda channels: nn.GroupNorm(32, channels)
    head = RetinaNetHead(backbone.out_channels, anchor_generator.num_anchors_per_location()[0], num_classes, norm_layer)
    model = RetinaNet(backbone, num_classes, anchor_generator=anchor_generator, head=head, **kwargs)
  • add a argument box_reg_loss_type for RetinaNetRegressionHead to support GIoU loss for compute regresion loss.

do you think it valuable?

@datumbox
Copy link
Contributor

@xiaohu2015 Thanks for the clarifications and for checking in before making changes.

add a argument norm_layer for RetinaNetClassificationHead and RetinaNetRegressionHead

Neither of the two subnetworks actually uses norm_layers. The original paper of retinanet (page 5) describes the classification and regressions as simple CNNs composed of Conv and ReLU layers. This is why the two subclasses don't receive such an argument. Networks that use them (for example SSDlite) support it.

Note that TorchVision provides canonical implementations of models (aka we try to align as close the paper as possible) while at the same time give the option to users to expand/adapt them to their needs. In this specific class, users who want to add normalization layers to the heads can pass a custom RetinaNetHead implementation. As this deviates from the canonical implementation, I think this is something that should be done on user side.

add a argument box_reg_loss_type for RetinaNet to support GIoU loss for compute regresion loss.

This again can be achieved by passing a custom head that implements the loss of choice. I do get it though that this is suboptimal. Currently the detection models work somehow differently from Classification and Segmentation where the loss is defined outside of the network by the user. The idea of removing it from the heads has been discussed before and that's a direction we might want to take on the future. Key challenge here is that often the loss requires a lot of additional information which make for a messy API.

@fmassa I think you are currently looking into making the detection models behave more like any other model by removing the transforms from them. Are you also experimenting with the losses or that's out of scope for the investigation?

do you think it valuable?

Just to clarify that your recommendations make sense and they are valuable. Here we are trying to assess how to maintain a delicate balance between keeping the API reasonable and easy-to-use VS giving the option to power-users to customize the implementation. Sometimes there are no perfect solutions for both. Add into the mix the Backwards Compatibility guarantees of TorchVision and you get a very hard problem to solve.

@xiaohu2015
Copy link
Contributor Author

@datumbox I understand, but I still think we can do some changes to support the two functions, which will be very helpful to users.
By the way, I also want to add FCOS to torchvision. But I don't know if anyone has a plan for that work?

@datumbox
Copy link
Contributor

I understand, but I still think we can do some changes to support the two functions, which will be very helpful to users.

We will need more time to discuss this in detail and see how it fits to our plans of revamping the detection models. If the revamp work enables the users in some other way, we should minimize introducing new parameters on the API to avoid Breaking changes.

By the way, I also want to add FCOS to torchvision. But I don't know if anyone has a plan for that work?

Thanks for the suggestion. The FCOS model wasn't in the list of proposed models maintained at #2707 so I just added it. It's certainly quite popular with lots of citations, so it checks that box. Let's take the discussion on that issue concerning pros/cons of adding support of this architecture. We have alternative plans of adding DETR which also goes towards the same direction of eliminating anchor boxes.

@datumbox
Copy link
Contributor

@xiaohu2015 Just to clarify one part. You have submitted quite a few proposals that require training models. I was just wondering if we do decide to move forwards with one of them, do you have access to the necessary hardware to provide pre-trained weights? This is usually too much to ask from contributors but if you have access to such a cluster, it does simplify things and can help us unblock some of these proposals. Thoughts?

@xiaohu2015
Copy link
Contributor Author

@datumbox Although FCOS is anchor-free, it can be implemented using anchor_generator (1 anchor per cell or position) just like detectron2.

@datumbox
Copy link
Contributor

datumbox commented Apr 2, 2022

@xiaohu2015 I don't know if you saw but #5444 investigates your proposal. If we merge it, GN and GIoU will be possible to use.

@xiaohu2015
Copy link
Contributor Author

@xiaohu2015 I don't know if you saw but #5444 investigates your proposal. If we merge it, GN and GIoU will be possible to use.

yes, I think this issue will be solved in the PR #5444, so I close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants