You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I mentioned in comment #94 that the prior boxes were not clipped correctly. I intended to compare the performance of using correct prior boxes versus "wrong" prior boxes, and just leave a comment to note this observation. However, I believe it is worth creating a new topic to discuss this further, as some interesting findings have emerged.
It is interesting to note that when the prior boxes are clipped correctly, the model achieves a mAP of 76.24. On the other hand, when the prior boxes are not clipped correctly, the model achieves a mAP of 77.11, which is close to the original performance.
The experiment was conducted with a batch size of 32, learning rate 1e-3 (doubled for biases), weight decay 5e-4, momentum 0.9, and run for 120k iterations with learning rate adjustments at 80k and 100k iterations. This is the same as the original paper, but with data augmentation implemented using numpy from the repository https://github.com/amdegroot/ssd.pytorch, which leads to faster data loading. I also made some modifications in augmentations, such as adding a difficulty annotation tensor (for a reason that is not important), and normalizing the data using the mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] (RGB respectively) of the ImageNet dataset.
In my opinion, the "wrong" prior boxes shoot over the image, but each group of them uses the same center (cx, cy). On the other hand, the correct prior boxes must be clipped, which may lead to changes in the center and make the model less stable than the "wrong" one.
The text was updated successfully, but these errors were encountered:
Thanks, @AakiraOtok. I'll link to your findings. Yeah, you may be right, I suppose the clipped priors not being symmetrical about the kernel centers might cause that slight difference (or the difference is negligible enough that it's hard to say there's any real difference).
I mentioned in comment #94 that the prior boxes were not clipped correctly. I intended to compare the performance of using correct prior boxes versus "wrong" prior boxes, and just leave a comment to note this observation. However, I believe it is worth creating a new topic to discuss this further, as some interesting findings have emerged.
It is interesting to note that when the prior boxes are clipped correctly, the model achieves a mAP of
76.24
. On the other hand, when the prior boxes are not clipped correctly, the model achieves a mAP of77.11
, which is close to the original performance.The experiment was conducted with a batch size of
32
, learning rate1e-3
(doubled for biases), weight decay5e-4
, momentum0.9
, and run for120k
iterations with learning rate adjustments at80k
and100k
iterations. This is the same as the original paper, but with data augmentation implemented using numpy from the repository https://github.com/amdegroot/ssd.pytorch, which leads to faster data loading. I also made some modifications in augmentations, such as adding a difficulty annotation tensor (for a reason that is not important), and normalizing the data using themean=[0.485, 0.456, 0.406]
andstd=[0.229, 0.224, 0.225]
(RGB respectively) of the ImageNet dataset.In my opinion, the "wrong" prior boxes shoot over the image, but each group of them uses the same center
(cx, cy)
. On the other hand, the correct prior boxes must be clipped, which may lead to changes in the center and make the model less stable than the "wrong" one.The text was updated successfully, but these errors were encountered: