[BC-breaking] Fix initialisation bug on FeaturePyramidNetwork#2954
Merged
fmassa merged 6 commits intopytorch:masterfrom Nov 9, 2020
Merged
[BC-breaking] Fix initialisation bug on FeaturePyramidNetwork#2954fmassa merged 6 commits intopytorch:masterfrom
fmassa merged 6 commits intopytorch:masterfrom
Conversation
639d06f to
f042027
Compare
Codecov Report
@@ Coverage Diff @@
## master #2954 +/- ##
==========================================
+ Coverage 73.41% 73.44% +0.03%
==========================================
Files 99 99
Lines 8812 8812
Branches 1389 1389
==========================================
+ Hits 6469 6472 +3
+ Misses 1917 1915 -2
+ Partials 426 425 -1
Continue to review full report at Codecov.
|
fmassa
approved these changes
Nov 9, 2020
Member
fmassa
left a comment
There was a problem hiding this comment.
Thanks a lot for the PR and the investigation Vasilis!
I agree with all your points, I think it's not necessary for now to run more tests for keypoint rcnn.
We should at some point though consider uploading newer weights, but we will need to handle versioning before.
bryant1410
pushed a commit
to bryant1410/vision-1
that referenced
this pull request
Nov 22, 2020
* Change children() to modules() to ensure init happens in all blocks. * Update expected values of all detection models. * Revert "Update expected values of all detection models." This reverts commit 050b64a * Update expecting values.
vfdev-5
pushed a commit
to Quansight/vision
that referenced
this pull request
Dec 4, 2020
* Change children() to modules() to ensure init happens in all blocks. * Update expected values of all detection models. * Revert "Update expected values of all detection models." This reverts commit 050b64a * Update expecting values.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2326
The FeaturePyramidNetwork intended to initialize all the
Conv2dlayer weights using the following distribution:vision/torchvision/ops/feature_pyramid_network.py
Lines 89 to 92 in 971c3e4
Unfortunately due to the use of
children()instead ofmodules()we only initialize the direct blocks of the FPN and not the ones of its nested modules. This means that those nested modules use the default distribution from PyTorch:This PR changes the use of
children()tomodules()to ensure that the weight initialization is consistent across all nested blocks of FeaturePyramidNetwork.The above change, though trivial in terms of coding it has side-effects:
The first issue is addressed on two separate PRs. See #2965 and #2966 from more information. To address the second, I retrained the Detection models on master and on this branch and compared their stats. Overall we see that fixing the bug and applying initialization correctly in all blocks leads to small improvements for all models except keypointrcnn.
Master
fasterrcnn_resnet50_fpn: boxAP=37.6
retinanet_resnet50_fpn: boxAP=36.3
maskrcnn_resnet50_fpn: boxAP=38.5, maskAP=34.5
keypointrcnn_resnet50_fpn: boxAP=55.7, kpAP=65.1
Branch
fasterrcnn_resnet50_fpn: boxAP=38.0
retinanet_resnet50_fpn: boxAP=36.4
maskrcnn_resnet50_fpn: boxAP=38.7, maskAP=34.7
keypointrcnn_resnet50_fpn: boxAP=55.5, kpAP=65.1
The deterioration in accuracy might be because of:
To fully investigate the situation, we would need to do multiple runs using different seeds for each initialization scheme and then conduct a statistical test to assess if our change has statistically significant effects on the accuracy. If the change turns out to have statistically significant negative results, we might want to support different init schemes per model.
I believe that the above is not worth the effort because:
Based on the above, I recommend merging the change into master.