This repository contains a script to test a fine-tuned LoRA model for multimodal (vision-language) tasks.
-
Install the required dependencies:
pip install -r requirements.txt -
Make sure you have the LoRA model files in the
lora_modeldirectory:- adapter_config.json
- adapter_model.safetensors
- special_tokens_map.json
- tokenizer.json
- tokenizer_config.json
-
Update the
image_pathvariable intest.pyto point to your own image file:image_path = "your_image.jpg" # Change this to your image path
-
Run the script:
python test.py -
Customize the instruction in the script as needed for different types of image analysis:
instruction = "You are an expert radiographer. Describe accurately what you see in this image."
- For optimal performance, a CUDA-compatible GPU is recommended
- If no GPU is available, the script will fall back to CPU mode (much slower)
- If you encounter CUDA out-of-memory errors, you may need to adjust the model loading parameters
- If you have issues with the model loading, ensure the base model is accessible