Shanghai Artificial Intellegence Laboratory
We propose
-
MLLM-DataEngine, a novel closed-loop system that bridges data generation, model training, and evaluation.
-
DataEngine-InstData, high-quality and targeted VQA data generated by MLLM-DataEngine, also refered to as GPTVQA in paper.
1. Prepare the code and the environment
Git clone our repository, creating a python environment and activate it via the following command
git clone https://github.com/JulioZhao97/MLLM-DataEngine.git
cd MLLM-DataEngine
conda env create -f environment.yml
conda activate minigpt4
pip3 install torch==2.0.0+cu117 torchvision==0.15.1+cu117 --index-url https://download.pytorch.org/whl/cu117
2. Prepare the pretrained Vicuna weights
The current version of MLLM-DataEngine is built on the v0 version of Vicuna. Please refer to our instruction here to prepare the Vicuna weights. The final weights would be in a single folder in a structure similar to the following:
vicuna_weights
├── config.json
├── generation_config.json
├── pytorch_model.bin.index.json
├── pytorch_model-00001-of-00003.bin
...
3. Prepare the stage1 pretrained MiniGPT-4 checkpoint
Download the only stage1 pretrained checkpoints according to the Vicuna model you prepare.
Checkpoint Aligned with Vicuna 7B | Checkpoint Aligned with Vicuna 13B |
---|---|
Downlad | Download |
4. Data Preparation
-
download COCO2017 images and annotations from official website here.
-
put coco images and annotations under
data/
-
download following datasets, uncompress, and put them under
data/
A-OKVQA | CCSBUAlign | DataEngine-InstData |
---|---|---|
download | download | download |
- finally check the data structure as follows:
.
├── A-OKVQA
│ ├── aokvqa_v1p0_train.json
│ └── aokvqa_v1p0_val_classified.json # with question type assigned by GPT-4
├── cc_sbu_align
│ ├── filter_cap.json
│ └── image
├── COCO2017
│ ├── annotations
│ │ └── ...
│ ├── train2017
│ │ └── ...
│ └── val2017
│ └── ...
└── gptvqa
├── DataEngine_round1_data.json
└── DataEngine_round2_data.json
1. model tuning
We add round1 and round2 data of DataEngine-InstData into instruct tuning. In instruct tuning, we use lora to finetune LLM. After preparing the data and model, setting --cfg-path
and run the following command. In our experiments, we use 4 A100 gpus. We add lora weights to finetune LLM, which are saved in checkpoint.
In train config, set the llama_model
to the path of vicuna model, and set ckpt
to the stage1 pretrained MiniGPT-4 model. Run the following command to finetune model. NUM_GPU
is the number of GPUs you use.
torchrun --nproc-per-node NUM_GPU train.py --cfg-path path/to/config
Train configs, finetuned model (lora weight), and results are shown in following table.
LLM | Base SFT Data | Round1 | Round2 | format | MMBench dev | AOKVQA val (MC) | AOKVQA val (DA) | config | checkpoint (lora weight) |
---|---|---|---|---|---|---|---|---|---|
7B | CCSBUAlign, A-OKVQA | QMA | 45.8 | 70.2 | 59.1 | config | model | ||
7B | CCSBUAlign, A-OKVQA | ✔️ | QMA | 47.1 | 71.8 | 60.8 | config | model | |
7B | CCSBUAlign, A-OKVQA | ✔️ | ✔️ | QMA | 52.7 | 73.6 | 62.0 | config | model |
7B | CCSBUAlign, A-OKVQA | QMAE | 25.7 | 71.0 | 59.1 | config | model | ||
7B | CCSBUAlign, A-OKVQA | ✔️ | QMAE | 40.6 | 71.0 | 60.3 | config | model | |
7B | CCSBUAlign, A-OKVQA | ✔️ | ✔️ | QMAE | 46.7 | 72.1 | 61.0 | config | model |
13B | CCSBUAlign, A-OKVQA | QMA | 52.6 | 74.8 | 62.6 | config | model | ||
13B | CCSBUAlign, A-OKVQA | ✔️ | QMA | 52.5 | 74.7 | 63.1 | config | model | |
13B | CCSBUAlign, A-OKVQA | ✔️ | ✔️ | QMA | 56.1 | 75.5 | 63.3 | config | model |
13B | CCSBUAlign, A-OKVQA | QMAE | 46.1 | 73.1 | 62.1 | config | model | ||
13B | CCSBUAlign, A-OKVQA | ✔️ | QMAE | 48.1 | 74.5 | 62.4 | config | model | |
13B | CCSBUAlign, A-OKVQA | ✔️ | ✔️ | QMAE | 49.2 | 74.0 | 61.9 | config | model |
2. merge lora weight into LLM
after finetuning, run following command to merge lora weight into LLM:
python apply_lora_delta.py --base-model path/to/vicuna/weight \
--ckpt path/to/lora/weight \
--target path/to/merged/llm
For evaluation on A-OKVQA, run following commands:
torchrun --nproc-per-node NUM_GPU --master-port $RANDOM train.py --cfg-path eval_configs/minigpt4_eval.yaml
in eval_configs/minigpt4_eval.yaml
, please change llama_model
to the path of merged LLM, and set ckpt
to the stage1 pretrained MiniGPT-4 model.
Three results files are stored under engine_pipeline/data
during evaluation:
-
aokvqa_eval.json
: stores each question and corresponding model answer. -
bad_case_aokvqa_classified.json
: stores questions which model answered wrongly on each kind of question type (bad cases). -
weight.json
: model error rate on each kind of question type.
These files are used in follow-up data-engine pipeline.
To evaluate on MMBenchmark, install opencompass according to following steps:
- Install opencompass
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/InternLM/opencompass.git
cd opencompass
pip install -e .
- prepare opencompass MiniGPT-4 environment according to here
cd opencompass/multimodal/models/minigpt_4
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
- install mmpretrain
pip install openmim
git clone https://github.com/open-mmlab/mmpretrain.git
cd mmpretrain
mim install -e .
mim install -e ".[multimodal]"
- install other packages
pip install decord timm omegaconf webdataset peft openpyxl iopath
After opencompass environment is prepared, set the dataset path and model path in evaluation config file. Evaluation config file used is configs/multimodel/minigpt_4/minigpt_4_7b_mmbench.py
.
-
Download MMBenchmark dev from here
-
set dataset path
dataset = dict(type='opencompass.MMBenchDataset',
data_file='path/to/mmbench_dev_20230712.tsv',
pipeline=val_pipeline)
- set model path
set llama_model
to the finetuned and weight merged LLM you prepared and minigpt_4_mmbench_load_from
to the proper stage1 minigpt4 pretrained model.
# model settings
minigpt_4_mmbench_model = dict(
type='minigpt-4',
low_resource=False,
llama_model='/path/to/vicuna-7b/',
prompt_constructor=dict(type=MiniGPT4MMBenchPromptConstructor,
image_prompt='###Human: <Img><ImageHere></Img>',
reply_prompt='###Assistant:'),
post_processor=dict(type=MiniGPT4MMBenchPostProcessor))
# evaluation settings
minigpt_4_mmbench_evaluator = [
dict(type='opencompass.DumpResults',
save_path='work_dirs/minigpt-4-7b-mmbench.xlsx')
]
minigpt_4_mmbench_load_from = '/path/to/prerained_minigpt4_7b.pth' # noqa
After everything is prepared, following the command here to evaluate on MMBenchmark dev.
For data generation in data-engine pipeline, please refer to here.
- MiniGPT-4 This reposity utilize MiniGPT-4 as codebase and base model.
- Lavis Fantastic vision-language model codebase.
- Vicuna Strong and open-source language model used by many MLLM work.
If you're using MLLM-DataEngine in your research or applications, please cite using this BibTeX:
@misc{zhao2023mllmdataengine,
title={MLLM-DataEngine: An Iterative Refinement Approach for MLLM},
author={Zhiyuan Zhao and Linke Ouyang and Bin Wang and Siyuan Huang and Pan Zhang and Xiaoyi Dong and Jiaqi Wang and Conghui He},
year={2023},
eprint={2308.13566},
archivePrefix={arXiv},
primaryClass={cs.LG}
}