QWen2vl-SAM4MLLM

This is the implementation of our ECCV'24 "SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation", based on QWen2VL.

Data

Download each dataset from website:

ADE20K:https://groups.csail.mit.edu/vision/datasets/ADE20K/
PACO-LVIS: https://github.com/facebookresearch/paco/tree/main
Part-ImageNet:https://github.com/TACJu/PartImageNet
RefCOCO: https://github.com/lichengunc/refer
GRES: https://github.com/henghuiding/ReLA

Put all of them under data directory so you should get:

    dataset/ (put your dataset)
    |  ├──ADE20K_2021_17_01/
    |  |  ├── images
    |  ├──PACO/
    |  |  ├──paco_lvis_v1_test.json
    |  |  ├──paco_lvis_v1_train.json
    |  |  ├──paco_lvis_v1_val.json
    |  ├──Part-ImageNet/
    |  |  ├── annotations
    |  |  ├── images
    |  ├──RefCOCO/
    |  |  ├── refcoco
    |  |  |  ├── instances.json
    |  |  |  ├── refs(unc).p
    |  |  ├── refcoco+
    |  |  |  ├── instances.json
    |  |  |  ├── refs(unc).p
    |  |  ├── refcocog
    |  |  |  ├── instances.json
    |  |  |  ├── refs(unc).p
    |  ├──GRES/
    |  |  ├── grefs(unc).json
    |  |  ├── instances.json
    |  ├──COCO/
    |  |  ├── train2017
    |  |  ├── val2017
    qwen2vl-SAM4MLLM/
    ├──├──data/
       ├──├──sam_checkpoint 
       ├──├──├── effvit_xl1_decoder_coco_ft.pt
       ├──├──├── xl1.pt (download from Google Drive [1])
       ├──├──ade20k_ref_data.json (generated by ./data/ade20k.ipynb)
       ├──├──paco_ref_data.json (generated by ./data/paco_lvis.ipynb)
       ├──├──refcoco_gres.json (generated by ./data/refcoco_gres.ipynb)
       ├──├──partimagenet_ref_data.json (generated by ./data/part_imagenet.ipynb)
       ├──├──convs_ep1.json (generated by ./to_chat_formate.ipynb)
    ├──├──LLaMA-Factory/
       ├──├── data
       ├──├──├── sam4mllm-qwen2vl.json (download from Google Drive [1])

[1] Google Drive

Requirements

Create virtual environment and install requirements

conda create -n sam4mllm python==3.11
pip install -r requirements.txt

LLaMA-Factory

cd LLaMA-Factory
pip install -e ".[torch,metrics]"

Flash-attn

pip install flash-attn --no-build-isolation

Efficient-ViT

git clone https://github.com/mit-han-lab/efficientvit.git
cd efficientvit
pip install -U -r requirements.txt

Transformers

Please refer this PR to modify the code in transformers. Link

Pre-processing Data

Run following notebooks to arrange data (Remember to set Data Path)

./data/ade20k.ipynb
./data/refcoco_gres.ipynb
./data/paco_lvis.ipynb
./data/part_imagenet.ipynb

Then generate dialogue format training data

./to_chat_format.ipynb

Note: You should prepare the JSON files ade20k_ref_data.json, paco_ref_data.json, refcoco_gres.json, and partimagenet_ref_data.json to generate convs_ep1.json, the dialogue format training data, using to_chat_formate.ipynb. Alternatively, you can download convs_ep1.json directly from Google Drive and place it in ./data.

Then generate qwen2 training data (remember to set DATA_PATH, and LLaMA_Factory_PATH in notebooks)

python convert_llava_qwen2.py

Training

cd LLaMA-Factory
llamafactory-cli train examples/train_lora/qwen2vl_lora_sft_sam4mllm.yaml

Export Model

llamafactory-cli export examples/merge_lora/qwen2vl_lora_sft_sam4mllm.yaml

Inference

llamafactory-cli webchat examples/inference/qwen2_vl_sam4mllm.yaml
llamafactory-cli api examples/inference/qwen2_vl_sam4mllm.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
LLaMA-Factory		LLaMA-Factory
data		data
efficientvit		efficientvit
image		image
test_imgs		test_imgs
Dockerfile		Dockerfile
README.md		README.md
convert_llava_qwen2.ipynb		convert_llava_qwen2.ipynb
convert_llava_qwen2.py		convert_llava_qwen2.py
inference.ipynb		inference.ipynb
requirements.txt		requirements.txt
to_chat_format.ipynb		to_chat_format.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QWen2vl-SAM4MLLM

Data

Requirements

Pre-processing Data

Training

Inference

About

Releases

Packages

Languages

weihua9217/qwen2vl-SAM4MLLM

Folders and files

Latest commit

History

Repository files navigation

QWen2vl-SAM4MLLM

Data

Requirements

Pre-processing Data

Training

Inference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages