Name	Name	Last commit message	Last commit date
Latest commit History 49 Commits
assets	assets
engine_pipeline	engine_pipeline
eval_configs	eval_configs
minigpt4	minigpt4
prompts	prompts
train_configs/ccsbualign_aokvqa	train_configs/ccsbualign_aokvqa
.gitignore	.gitignore
LICENSE.md	LICENSE.md
PrepareVicuna.md	PrepareVicuna.md
README.md	README.md
apply_lora_delta.py	apply_lora_delta.py
environment.yml	environment.yml
requirements.txt	requirements.txt
train.py	train.py

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

Shanghai Artificial Intellegence Laboratory

Introduction

We propose

MLLM-DataEngine, a novel closed-loop system that bridges data generation, model training, and evaluation.
DataEngine-InstData, high-quality and targeted VQA data generated by MLLM-DataEngine, also refered to as GPTVQA in paper.

Getting Started

1. Prepare the code and the environment

Git clone our repository, creating a python environment and activate it via the following command

git clone https://github.com/JulioZhao97/MLLM-DataEngine.git
cd MLLM-DataEngine
conda env create -f environment.yml
conda activate minigpt4
pip3 install torch==2.0.0+cu117 torchvision==0.15.1+cu117 --index-url https://download.pytorch.org/whl/cu117

2. Prepare the pretrained Vicuna weights

The current version of MLLM-DataEngine is built on the v0 version of Vicuna. Please refer to our instruction here to prepare the Vicuna weights. The final weights would be in a single folder in a structure similar to the following:

vicuna_weights
├── config.json
├── generation_config.json
├── pytorch_model.bin.index.json
├── pytorch_model-00001-of-00003.bin
...

3. Prepare the stage1 pretrained MiniGPT-4 checkpoint

Download the only stage1 pretrained checkpoints according to the Vicuna model you prepare.

Checkpoint Aligned with Vicuna 7B	Checkpoint Aligned with Vicuna 13B
Downlad	Download

4. Data Preparation

download COCO2017 images and annotations from official website here.
put coco images and annotations under data/
download following datasets, uncompress, and put them under data/

A-OKVQA	CCSBUAlign	DataEngine-InstData
download	download	download

finally check the data structure as follows:

.
├── A-OKVQA
│   ├── aokvqa_v1p0_train.json
│   └── aokvqa_v1p0_val_classified.json  # with question type assigned by GPT-4
├── cc_sbu_align
│   ├── filter_cap.json
│   └── image
├── COCO2017
│   ├── annotations
│   │    └── ...
│   ├── train2017
│   │    └── ...
│   └── val2017
│        └── ...
└── gptvqa
    ├── DataEngine_round1_data.json
    └── DataEngine_round2_data.json

Model Training

1. model tuning

We add round1 and round2 data of DataEngine-InstData into instruct tuning. In instruct tuning, we use lora to finetune LLM. After preparing the data and model, setting --cfg-path and run the following command. In our experiments, we use 4 A100 gpus. We add lora weights to finetune LLM, which are saved in checkpoint.

In train config, set the llama_model to the path of vicuna model, and set ckpt to the stage1 pretrained MiniGPT-4 model. Run the following command to finetune model. NUM_GPU is the number of GPUs you use.

torchrun --nproc-per-node NUM_GPU train.py --cfg-path path/to/config

Train configs, finetuned model (lora weight), and results are shown in following table.

LLM	Base SFT Data	Round1	Round2	format	MMBench dev	AOKVQA val (MC)	AOKVQA val (DA)	config	checkpoint (lora weight)
7B	CCSBUAlign, A-OKVQA			QMA	45.8	70.2	59.1	config	model
7B	CCSBUAlign, A-OKVQA	✔️		QMA	47.1	71.8	60.8	config	model
7B	CCSBUAlign, A-OKVQA	✔️	✔️	QMA	52.7	73.6	62.0	config	model
7B	CCSBUAlign, A-OKVQA			QMAE	25.7	71.0	59.1	config	model
7B	CCSBUAlign, A-OKVQA	✔️		QMAE	40.6	71.0	60.3	config	model
7B	CCSBUAlign, A-OKVQA	✔️	✔️	QMAE	46.7	72.1	61.0	config	model
13B	CCSBUAlign, A-OKVQA			QMA	52.6	74.8	62.6	config	model
13B	CCSBUAlign, A-OKVQA	✔️		QMA	52.5	74.7	63.1	config	model
13B	CCSBUAlign, A-OKVQA	✔️	✔️	QMA	56.1	75.5	63.3	config	model
13B	CCSBUAlign, A-OKVQA			QMAE	46.1	73.1	62.1	config	model
13B	CCSBUAlign, A-OKVQA	✔️		QMAE	48.1	74.5	62.4	config	model
13B	CCSBUAlign, A-OKVQA	✔️	✔️	QMAE	49.2	74.0	61.9	config	model

2. merge lora weight into LLM

after finetuning, run following command to merge lora weight into LLM:

python apply_lora_delta.py --base-model path/to/vicuna/weight \
    --ckpt path/to/lora/weight \
    --target path/to/merged/llm

Model Evaluation

evaluate on A-OKVQA

For evaluation on A-OKVQA, run following commands:

torchrun --nproc-per-node NUM_GPU --master-port $RANDOM train.py --cfg-path eval_configs/minigpt4_eval.yaml

in eval_configs/minigpt4_eval.yaml, please change llama_model to the path of merged LLM, and set ckpt to the stage1 pretrained MiniGPT-4 model.

Three results files are stored under engine_pipeline/data during evaluation:

aokvqa_eval.json: stores each question and corresponding model answer.
bad_case_aokvqa_classified.json: stores questions which model answered wrongly on each kind of question type (bad cases).
weight.json: model error rate on each kind of question type.

These files are used in follow-up data-engine pipeline.

evaluate on MMBenchmark

To evaluate on MMBenchmark, install opencompass according to following steps:

Install opencompass

conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/InternLM/opencompass.git
cd opencompass
pip install -e .

prepare opencompass MiniGPT-4 environment according to here

cd opencompass/multimodal/models/minigpt_4
git clone https://github.com/Vision-CAIR/MiniGPT-4.git

install mmpretrain

pip install openmim
git clone https://github.com/open-mmlab/mmpretrain.git
cd mmpretrain
mim install -e .
mim install -e ".[multimodal]"

install other packages

pip install decord timm omegaconf webdataset peft openpyxl iopath

After opencompass environment is prepared, set the dataset path and model path in evaluation config file. Evaluation config file used is configs/multimodel/minigpt_4/minigpt_4_7b_mmbench.py.

Download MMBenchmark dev from here
set dataset path

dataset = dict(type='opencompass.MMBenchDataset',
               data_file='path/to/mmbench_dev_20230712.tsv',
               pipeline=val_pipeline)

set model path

set llama_model to the finetuned and weight merged LLM you prepared and minigpt_4_mmbench_load_from to the proper stage1 minigpt4 pretrained model.

# model settings
minigpt_4_mmbench_model = dict(
    type='minigpt-4',
    low_resource=False,
    llama_model='/path/to/vicuna-7b/',
    prompt_constructor=dict(type=MiniGPT4MMBenchPromptConstructor,
                            image_prompt='###Human: <Img><ImageHere></Img>',
                            reply_prompt='###Assistant:'),
    post_processor=dict(type=MiniGPT4MMBenchPostProcessor))

# evaluation settings
minigpt_4_mmbench_evaluator = [
    dict(type='opencompass.DumpResults',
         save_path='work_dirs/minigpt-4-7b-mmbench.xlsx')
]

minigpt_4_mmbench_load_from = '/path/to/prerained_minigpt4_7b.pth'  # noqa

After everything is prepared, following the command here to evaluate on MMBenchmark dev.

Data-engine Pipeline

For data generation in data-engine pipeline, please refer to here.

Acknowledgement

MiniGPT-4 This reposity utilize MiniGPT-4 as codebase and base model.
Lavis Fantastic vision-language model codebase.
Vicuna Strong and open-source language model used by many MLLM work.

If you're using MLLM-DataEngine in your research or applications, please cite using this BibTeX:

@misc{zhao2023mllmdataengine,
      title={MLLM-DataEngine: An Iterative Refinement Approach for MLLM}, 
      author={Zhiyuan Zhao and Linke Ouyang and Bin Wang and Siyuan Huang and Pan Zhang and Xiaoyi Dong and Jiaqi Wang and Conghui He},
      year={2023},
      eprint={2308.13566},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

License

Apache 2.0 License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

Introduction

Getting Started

Model Training

Model Evaluation

evaluate on A-OKVQA

evaluate on MMBenchmark

Data-engine Pipeline

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

opendatalab/MLLM-DataEngine

Folders and files

Latest commit

History

Repository files navigation

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

Introduction

Getting Started

Model Training

Model Evaluation

evaluate on A-OKVQA

evaluate on MMBenchmark

Data-engine Pipeline

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages