-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add GN and GIoU loss for retinanet #4932
Comments
Hi @xiaohu2015 Thanks a lot for the proposal. Can you please clarify what GN is ? gIoU is currently supported as For loss functions there is an RFC #2980 and I guess gIoU loss is proposed there. |
@oke-aditya GroupNorm |
@oke-aditya I think the pr is more about performance enhancement for retiannet model. |
Yes I think it would be great to improve the model. Do you have some statistics of the new mAP possible with GroupNorm and gIoU? I'm asking to so that I could give a read 😅 |
@oke-aditya I think the mAP can increase about 1~2 point if use GN and GIoU loss. |
@oke-aditya I want to add a new model retinanet_gn_resnet50_fpn which uses GN on head. |
@xiaohu2015 Thanks for the proposal. What other changes do you propose to make to the standard retinanet+resnet50 model builder other than passing GroupNorm? If the only thing needed is to pass GN, I'm not sure it's worth adding another model. This is something that users of TorchVision can do easily on their own. Let me know what you think, thanks! |
@datumbox I want to make the follow changes:
so the user can use
do you think it valuable? |
@xiaohu2015 Thanks for the clarifications and for checking in before making changes.
Neither of the two subnetworks actually uses norm_layers. The original paper of retinanet (page 5) describes the classification and regressions as simple CNNs composed of Conv and ReLU layers. This is why the two subclasses don't receive such an argument. Networks that use them (for example SSDlite) support it. Note that TorchVision provides canonical implementations of models (aka we try to align as close the paper as possible) while at the same time give the option to users to expand/adapt them to their needs. In this specific class, users who want to add normalization layers to the heads can pass a custom RetinaNetHead implementation. As this deviates from the canonical implementation, I think this is something that should be done on user side.
This again can be achieved by passing a custom head that implements the loss of choice. I do get it though that this is suboptimal. Currently the detection models work somehow differently from Classification and Segmentation where the loss is defined outside of the network by the user. The idea of removing it from the heads has been discussed before and that's a direction we might want to take on the future. Key challenge here is that often the loss requires a lot of additional information which make for a messy API. @fmassa I think you are currently looking into making the detection models behave more like any other model by removing the transforms from them. Are you also experimenting with the losses or that's out of scope for the investigation?
Just to clarify that your recommendations make sense and they are valuable. Here we are trying to assess how to maintain a delicate balance between keeping the API reasonable and easy-to-use VS giving the option to power-users to customize the implementation. Sometimes there are no perfect solutions for both. Add into the mix the Backwards Compatibility guarantees of TorchVision and you get a very hard problem to solve. |
@datumbox I understand, but I still think we can do some changes to support the two functions, which will be very helpful to users. |
We will need more time to discuss this in detail and see how it fits to our plans of revamping the detection models. If the revamp work enables the users in some other way, we should minimize introducing new parameters on the API to avoid Breaking changes.
Thanks for the suggestion. The FCOS model wasn't in the list of proposed models maintained at #2707 so I just added it. It's certainly quite popular with lots of citations, so it checks that box. Let's take the discussion on that issue concerning pros/cons of adding support of this architecture. We have alternative plans of adding DETR which also goes towards the same direction of eliminating anchor boxes. |
@xiaohu2015 Just to clarify one part. You have submitted quite a few proposals that require training models. I was just wondering if we do decide to move forwards with one of them, do you have access to the necessary hardware to provide pre-trained weights? This is usually too much to ask from contributors but if you have access to such a cluster, it does simplify things and can help us unblock some of these proposals. Thoughts? |
@datumbox Although FCOS is anchor-free, it can be implemented using anchor_generator (1 anchor per cell or position) just like detectron2. |
@xiaohu2015 I don't know if you saw but #5444 investigates your proposal. If we merge it, GN and GIoU will be possible to use. |
yes, I think this issue will be solved in the PR #5444, so I close this issue. |
🚀 The feature
Motivation, pitch
As we know, retinanet should get better mAP if using GN on retianet head and GIoU loss
Alternatives
No response
Additional context
No response
cc @datumbox
The text was updated successfully, but these errors were encountered: