VideoMAE ViT-H pre-train does not contain the decoder weights #89

sandstorm12 · 2023-04-13T22:33:47Z

Problem

The VideoMAE ViT-H and VideoMAE ViT-S pre-trained kinetics weights seem to have a problem. When loading the weights of other pre-trained models like ViT-L or ViT-B, the state_dict contains the weights for the decoder layers. But this is not true for the ViT-H and ViT-S. As a result, it is not possible to load it into an encoder/decoder setup.

How to reproduce

To reproduce, just download the weights and load the state_dict. Comparing it to the other pre-trained weights you can see the decoder weights are missing.

URL = "https://drive.google.com/file/d/1AJQR1Rsi2N1pDn9tLyJ8DQrUREiBA1bO/view?usp=sharing"
output_name = "checkpoint.pth"
gdown.cached_download(URL, output_name)

state_dict = torch.load(output_name)
print(state_dict["module"])

The state_dict is very large, so I don't include the output here.

The text was updated successfully, but these errors were encountered:

innat · 2023-08-26T07:37:23Z

The link of pretrain VideoMAE ViT-H is wrong sort of.
It has only the encoder part.

innat · 2023-08-27T04:54:28Z

cc. @yztongzhan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VideoMAE ViT-H pre-train does not contain the decoder weights #89

VideoMAE ViT-H pre-train does not contain the decoder weights #89

sandstorm12 commented Apr 13, 2023 •

edited

Loading

innat commented Aug 26, 2023

innat commented Aug 27, 2023

VideoMAE ViT-H pre-train does not contain the decoder weights #89

VideoMAE ViT-H pre-train does not contain the decoder weights #89

Comments

sandstorm12 commented Apr 13, 2023 • edited Loading

Problem

How to reproduce

innat commented Aug 26, 2023

innat commented Aug 27, 2023

sandstorm12 commented Apr 13, 2023 •

edited

Loading