This repository is for evaluating the basic performance of SAM on the Referring Image Segmementation task. Check out the SAM project here.
The very basic approach we use is to:
- Produce a referring expression representation using the CLIP language transformer.
- Extract SAM masks from an image.
- Embed the masked sections into a CLIP model to produce a representation of the section.
- Compare the masked section representation to the representation of the referring expression.
The code for the approach can be found in model.py
pip install git+https://github.com/facebookresearch/segment-anything.git
I used the sam_vit_h_4b8939.pth
model from the SAM repository. It can be found here
Follow the directions in prepare_dataset.md
to download and setup the evaluation dataset.
To evaluate the approach run.
python evaluate_on_refcoco.py