size mismatch for vision_model.embeddings.patch_embedding.weight:

Hello, author.


When running the inference demo of the model "lmms-lab/LLaVA-Video-7B-Qwen2," an error occurred while loading the vision tower (siglip-so400m-patch14-384):


File "/home/jeeves/LLaVA-NeXT-main/llava/model/multimodal_encoder/clip_encoder.py", line 41, in load_model
    self.vision_tower = CLIPVisionModel.from_pretrained(self.vision_tower_name, device_map=device_map)
  File "/home/jeeves/.conda/envs/zyy_llava_next_video/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3677, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/jeeves/.conda/envs/zyy_llava_next_video/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4155, in _load_pretrained_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for CLIPVisionModel:
        size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape torch.Size([1152, 3, 14, 14]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]).
        size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([729, 1152]) from checkpoint, the shape in current model is torch.Size([50, 768]).
        size mismatch for vision_model.encoder.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1152, 1152]) from checkpoint, the shape in current model is torch.Size([768, 768]).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

size mismatch for vision_model.embeddings.patch_embedding.weight: #246

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

size mismatch for vision_model.embeddings.patch_embedding.weight: #246

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions