Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

‘Hide-and-Seek’ Random Masking Transform #6796

Open
faberno opened this issue Oct 19, 2022 · 2 comments
Open

‘Hide-and-Seek’ Random Masking Transform #6796

faberno opened this issue Oct 19, 2022 · 2 comments

Comments

@faberno
Copy link

faberno commented Oct 19, 2022

🚀 The feature

Source
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization (Scholar, Arxiv)
Number of citations: 542

Method
The image is divided into a grid and then every patch of this grid is masked with probability p. So the inputs are patch_size, p and fill_value.

Motivation, pitch

Like described in the paper, this augmentation method can improve weakly-supervised object localization, as with it not only the most discriminative parts in the image are found, but all relevant ones.

I have already implemented this method, so I could open a PR, if you think this feature is a useful addition.

Alternatives

No response

Additional context

Here an example from the paper.
Screenshot from 2022-10-19 22-56-54

cc @vfdev-5 @datumbox

@datumbox
Copy link
Contributor

@faberno Thanks a lot for the proposal. The augmentation seems a bit old but definitely has sufficient number of citations and it's worth considering.

I'll need to check the paper in more detail but I was hoping if you could provide some clarifications concerning its operation:

  • During training, do we actually pass an image that has dark patches to the model? If yes, I wonder how this interacts with Transformer based models. If no, are they encoded in a specific way by patchify layers?
  • In object detection, semantic & instance segmentation, what happens to the targets (bboxes, masks, labels etc) within the hidden patches? This augmentation might be tricky for partially hidden objects.

Thanks!

@faberno
Copy link
Author

faberno commented Oct 20, 2022

Hey @datumbox, thanks for your answer. Regarding your questions:

  • Yes, we pass an image with different coloured patches to the model. But the paper recommends, that they should not be just black, but rather the mean RGB value over all images in the train dataset, to avoid a mismatch in distributions between the activations of train and test.
  • This method does not account for hidden targets and as far as I understand, thats part of the strategy. The masking probability and the patch size are chosen in a way, that the targets are unlikely to be fully hidden. By that, in every epoch the model learns slightly different parts of the object and can therefore find the whole object instead of just its most discriminative parts (see picture below).
    Screenshot from 2022-10-20 19-21-00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants