You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The VideoMAE ViT-H and VideoMAE ViT-S pre-trained kinetics weights seem to have a problem. When loading the weights of other pre-trained models like ViT-L or ViT-B, the state_dict contains the weights for the decoder layers. But this is not true for the ViT-H and ViT-S. As a result, it is not possible to load it into an encoder/decoder setup.
How to reproduce
To reproduce, just download the weights and load the state_dict. Comparing it to the other pre-trained weights you can see the decoder weights are missing.
Problem
The
VideoMAE ViT-H
andVideoMAE ViT-S
pre-trained kinetics weights seem to have a problem. When loading the weights of other pre-trained models likeViT-L
orViT-B
, the state_dict contains the weights for the decoder layers. But this is not true for theViT-H
andViT-S
. As a result, it is not possible to load it into an encoder/decoder setup.How to reproduce
To reproduce, just download the weights and load the state_dict. Comparing it to the other pre-trained weights you can see the decoder weights are missing.
The state_dict is very large, so I don't include the output here.
The text was updated successfully, but these errors were encountered: