Skip to content

Conversation

@wildcraft958
Copy link

Summary

Fixes several issues that prevent loading pretrained LISAT models in different environments.

Problems Fixed

1. ValueError: Unknown vision tower: /home/patrickwu/...

The saved model config contains hardcoded absolute paths from the original training machine. When users download the model, this path doesn't exist.

Fix: builder.py now falls back to openai/clip-vit-large-patch14 when the config path doesn't exist locally.

2. KeyError: 'train_mask_decoder'

During inference loading via from_pretrained(), kwargs like train_mask_decoder and out_dim aren't passed, causing a KeyError.

Fix: LISAT.py and LISAT_eval.py now use kwargs.get() with sensible defaults.

3. Transformers version/registration errors

Strict version checks in transformers cause ImportError at import time. Additionally, duplicate AutoConfig.register calls raise ValueError: already used.

Fix: Runtime monkey-patch disables strict version checks before importing transformers, and wraps registration methods to ignore duplicates.

Files Changed

  • model/LISAT.py — kwargs defaults + transformers patch
  • model/LISAT_eval.py — kwargs defaults + transformers patch
  • model/llava/model/multimodal_encoder/builder.py — vision tower fallback

Testing

  • Syntax verified with py_compile
  • Successfully tested in Modal deployment with Python 3.12 + torch 2.3 + transformers 4.31

Contribution by @wildcraft958

- LISAT.py: Add transformers version patch, use kwargs.get() with defaults
- builder.py: Fallback to openai/clip-vit-large-patch14 when saved path doesn't exist

Fixes issues reported in GitHub: missing train_mask_decoder kwargs, hardcoded
vision tower paths from original author's machine causing ValueError.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant