An image quality assessment framework leveraging visual-language models for multi-granularity semantic fidelity.
Hanfei Li1 · Anle Ke1 · Jiawen Gu2 · Chao Zhou2 · Tong Chen1 · Zhan Ma1 ·
1 Nanjing University 2 Kuaishou Technology
Before installing the dependencies, please prepare the model weights.
Create the following directory structure in the project root:
weights/
├── Gemma/
└── SAM/
Download the required model weights from Hugging Face and place them in the corresponding folders:
Gemma weights: https://huggingface.co/google/gemma-2b-it
SAM weights: https://huggingface.co/google-bert/bert-base-uncased
Make sure your environment meets the following requirements:
- Python >= 3.9
- PyTorch == 2.6.0
- transformers == 4.53.0
- timm == 1.0.6
- six
- accelerate
Then, navigate to the GroundingDINO directory and install it in editable mode:
cd GroundingDINO
pip install -e .