You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Source
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization (Scholar, Arxiv)
Number of citations: 542
Method
The image is divided into a grid and then every patch of this grid is masked with probability p. So the inputs are patch_size, p and fill_value.
Motivation, pitch
Like described in the paper, this augmentation method can improve weakly-supervised object localization, as with it not only the most discriminative parts in the image are found, but all relevant ones.
I have already implemented this method, so I could open a PR, if you think this feature is a useful addition.
@faberno Thanks a lot for the proposal. The augmentation seems a bit old but definitely has sufficient number of citations and it's worth considering.
I'll need to check the paper in more detail but I was hoping if you could provide some clarifications concerning its operation:
During training, do we actually pass an image that has dark patches to the model? If yes, I wonder how this interacts with Transformer based models. If no, are they encoded in a specific way by patchify layers?
In object detection, semantic & instance segmentation, what happens to the targets (bboxes, masks, labels etc) within the hidden patches? This augmentation might be tricky for partially hidden objects.
Hey @datumbox, thanks for your answer. Regarding your questions:
Yes, we pass an image with different coloured patches to the model. But the paper recommends, that they should not be just black, but rather the mean RGB value over all images in the train dataset, to avoid a mismatch in distributions between the activations of train and test.
This method does not account for hidden targets and as far as I understand, thats part of the strategy. The masking probability and the patch size are chosen in a way, that the targets are unlikely to be fully hidden. By that, in every epoch the model learns slightly different parts of the object and can therefore find the whole object instead of just its most discriminative parts (see picture below).
🚀 The feature
Source
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization (Scholar, Arxiv)
Number of citations: 542
Method
The image is divided into a grid and then every patch of this grid is masked with probability p. So the inputs are patch_size, p and fill_value.
Motivation, pitch
Like described in the paper, this augmentation method can improve weakly-supervised object localization, as with it not only the most discriminative parts in the image are found, but all relevant ones.
I have already implemented this method, so I could open a PR, if you think this feature is a useful addition.
Alternatives
No response
Additional context
Here an example from the paper.
cc @vfdev-5 @datumbox
The text was updated successfully, but these errors were encountered: