Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is only dino_vits8 supported? #13

Open
juancamilog opened this issue Mar 9, 2023 · 5 comments
Open

Why is only dino_vits8 supported? #13

juancamilog opened this issue Mar 9, 2023 · 5 comments

Comments

@juancamilog
Copy link

In the examples, if you change the model_type to anything other than dino_vits8 the code crashes because of an assert in ViTExtractor.extract_saliency_maps. What needs to change to properly support other model types?

@RickyYXY
Copy link

RickyYXY commented Jun 4, 2023

+1
This is weird, hope the authors can give us an answer.
Thks

@RickyYXY
Copy link

RickyYXY commented Jun 5, 2023

@juancamilog I read the code again and find out that this may be because the authors have only tried the head_idxs = [0, 2, 4, 5] for vits. Since vit-b/l/g have different number of heads, the suitable head_idxs also should be found.
But that's not a simple question...

@ShirAmir
Copy link
Owner

Hi!

The saliency maps used in the co-segmentation, part co-segmentation and correspondences examples is acquired by aggregating heads 0, 2, 4, 5 of Dino_vit8s. We removed heads 1 and 3 as they empirically attended bg areas. It is also possible to change the code to use different Dino_vits aggregating all heads, but would require adjusting some of the hyper parameters in each application.

LMK if you have further questions 🙏

@RickyYXY
Copy link

Thks for your reply. If using vit models with more heads, the 0,2,4,5 idx can keep the same? I'm not sure the larger model's head will get the similar attention result.

By the way, I have tried using dinov2's weight, but find out the result is even worse. It seems that the patch_size will significantly influence the foreground seg results. The smaller the patch_size, the better the result can be get. Do you have any ideas about using dinov2's feature? Because I only found DINO_vit14 pretrained model in their repo.

@krishnaadithya
Copy link

@RickyYXY did you get it working for vitb?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants