Skip to content

Conversation

wnma3mz
Copy link

@wnma3mz wnma3mz commented Apr 30, 2022

When I read the code of the authors of the paper, I found that the authors used Conv and PixelShuffle layers. To ensure the effectiveness of the MIM method, maybe we should use this decoder method.

P.S. Unfortunately, I have not modified the loss function for the time being because of the difference in the way the mask is generated.

https://github.com/microsoft/SimMIM/blob/bec329f54c2f84db16974035c3f010a88f1b3eb0/models/simmim.py#L104-L121

@lucidrains
Copy link
Owner

oh hey! yea I believe it is equivalent

at least, I could do an extra rearrange on the predicted pixels to get back the reconstructed image

@wnma3mz
Copy link
Author

wnma3mz commented Apr 30, 2022

Thank you for your reply.

Yes, it is. I thought the original restoration method should be followed (using Conv+PixelShuffle), if you don't think it's important then you could close the PR .

@lucidrains lucidrains force-pushed the main branch 4 times, most recently from 70284c0 to 4ef72fc Compare May 3, 2022 17:29
@lucidrains
Copy link
Owner

@wnma3mz i think being able to get back the reconstructed image is a good idea, let me get the function out when i find some time. feel free to leave this open!

@wnma3mz
Copy link
Author

wnma3mz commented May 4, 2022

It's hard not to support this and it will help many more people, including me. If there is anything I can do to help, please contact me directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants