First, enter the vlmevalkit
directory and install all dependencies:
cd vlmevalkit
pip install --upgrade pip
pip install -e .
wget https://download.pytorch.org/whl/cu118/torch-2.2.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=4377e0a7fe8ff8ffc4f7c9c6130c1dcd3874050ae4fc28b7ff1d35234fbca423
wget https://download.pytorch.org/whl/cu118/torchvision-0.17.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=2e63d62e09d9b48b407d3e1b30eb8ae4e3abad6968e8d33093b60d0657542428
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu118torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install torch-2.2.0%2Bcu118-cp310-cp310-linux_x86_64.whl
pip install torchvision-0.17.0%2Bcu118-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.6.3+cu118torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
Then, run scripts/run_inference.sh
, which receives two input parameters in sequence: MODELNAME
and DATALIST
. MODELNAME
represents the name of the model, DATALIST
represents the datasets used for inference:
chmod +x ./scripts/run_inference.sh
./scripts/run_inference.sh $MODELNAME $DATALIST
The five available choices for MODELNAME
are listed in vlmeval/config.py
:
minicpm_series = {
'MiniCPM-V': partial(MiniCPM_V, model_path='openbmb/MiniCPM-V'),
'MiniCPM-V-2': partial(MiniCPM_V, model_path='openbmb/MiniCPM-V-2'),
'MiniCPM-Llama3-V-2_5': partial(MiniCPM_Llama3_V, model_path='openbmb/MiniCPM-Llama3-V-2_5'),
'MiniCPM-V-2_6': partial(MiniCPM_V_2_6, model_path='openbmb/MiniCPM-V-2_6'),
'MiniCPM-o-2_6': partial(MiniCPM_o_2_6, model_path='openbmb/MiniCPM-o-2_6'),
}
All available choices for DATALIST
are listed in vlmeval/utils/dataset_config.py
. While evaluating on multiple datasets at a time, separate the names of different datasets with spaces and add quotation marks at both ends:
$DATALIST="MMMU_DEV_VAL MathVista_MINI MMVet MMBench_DEV_EN_V11 MMBench_DEV_CN_V11 MMStar HallusionBench AI2D_TEST"
When the benchmark requires GPT series model for scoring, please specify OPENAI_API_BASE
and OPENAI_API_KEY
in the .env
file.
In order to reproduce the results on OpenCompass benchmarks together with ChartQA and MME, which are displayed in the table on the homepage (columns between OCRBench and HallusionBench), you need to run the script according to the following settings:
# Please note that we use different prompts for the perception and reasoning sets of MME. While evaluating on the reasoning subset, CoT is required, so you need to manually modify the judgment condition of the use_cot function in vlmeval/vlm/minicpm_v.py
./scripts/run_inference.sh MiniCPM-o-2_6 "MMMU_DEV_VAL MathVista_MINI MMVet MMBench_TEST_EN_V11 MMBench_TEST_CN_V11 MMStar HallusionBench AI2D_TEST OCRBench ChartQA_TEST MME"
First, enter the vqaeval
directory and install all dependencies. Then, create downloads
subdirectory to store the downloaded dataset for all tasks:
cd vqaeval
pip install -r requirements.txt
mkdir downloads
Download the datasets from the following links and place it in the specified directories:
cd downloads
mkdir TextVQA && cd TextVQA
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
unzip train_val_images.zip && rm train_val_images.zip
mv train_val_images/train_images . && rm -rf train_val_images
wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json
cd ../..
cd downloads
mkdir DocVQA && cd DocVQA && mkdir spdocvqa_images
# Download Images and Annotations from Task 1 - Single Page Document Visual Question Answering at https://rrc.cvc.uab.es/?ch=17&com=downloads
# Move the spdocvqa_images.tar.gz and spdocvqa_qas.zip to DocVQA directory
tar -zxvf spdocvqa_images.tar.gz -C spdocvqa_images && rm spdocvqa_images.tar.gz
unzip spdocvqa_qas.zip && rm spdocvqa_qas.zip
cp spdocvqa_qas/val_v1.0_withQT.json . && cp spdocvqa_qas/test_v1.0.json . && rm -rf spdocvqa_qas
cd ../..
The downloads
directory should be organized according to the following structure:
downloads
├── TextVQA
│ ├── train_images
│ │ ├── ...
│ ├── TextVQA_0.5.1_val.json
├── DocVQA
│ ├── spdocvqa_images
│ │ ├── ...
│ ├── val_v1.0_withQT.json
│ ├── test_v1.0.json
Modify the parameters in shell/run_inference.sh
and run inference:
chmod +x ./shell/run_inference.sh
./shell/run_inference.sh
All optional parameters are listed in eval_utils/getargs.py
. The meanings of some major parameters are listed as follows.
For MiniCPM-o-2_6
, set model_name
to minicpmo26
:
# path to images and their corresponding questions
# TextVQA
--textVQA_image_dir
--textVQA_ann_path
# DocVQA
--docVQA_image_dir
--docVQA_ann_path
# DocVQATest
--docVQATest_image_dir
--docVQATest_ann_path
# whether to eval on certain task
--eval_textVQA
--eval_docVQA
--eval_docVQATest
--eval_all
# model name and model path
--model_name
--model_path
# load model from ckpt
--ckpt
# the way the model processes input data, "interleave" represents interleaved image-text form, while "old" represents non-interleaved.
--generate_method
--batchsize
# path to save the outputs
--answer_path
While evaluating on different tasks, parameters need to be set as follows:
--eval_textVQA
--textVQA_image_dir ./downloads/TextVQA/train_images
--textVQA_ann_path ./downloads/TextVQA/TextVQA_0.5.1_val.json
--eval_docVQA
--docVQA_image_dir ./downloads/DocVQA/spdocvqa_images
--docVQA_ann_path ./downloads/DocVQA/val_v1.0_withQT.json
--eval_docVQATest
--docVQATest_image_dir ./downloads/DocVQA/spdocvqa_images
--docVQATest_ann_path ./downloads/DocVQA/test_v1.0.json
For the DocVQATest task, in order to upload the inference results to the official website for evaluation, run shell/run_transform.sh
for format transformation after inference. input_file_path
represents the path to the original output json, output_file_path
represents the path to the transformed json:
chmod +x ./shell/run_transform.sh
./shell/run_transform.sh
Expand
First, enter the vlmevalkit
directory and install all dependencies:
cd vlmevalkit
pip install --upgrade pip
pip install -e .
wget https://download.pytorch.org/whl/cu118/torch-2.2.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=4377e0a7fe8ff8ffc4f7c9c6130c1dcd3874050ae4fc28b7ff1d35234fbca423
wget https://download.pytorch.org/whl/cu118/torchvision-0.17.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=2e63d62e09d9b48b407d3e1b30eb8ae4e3abad6968e8d33093b60d0657542428
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu118torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install torch-2.2.0%2Bcu118-cp310-cp310-linux_x86_64.whl
pip install torchvision-0.17.0%2Bcu118-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.6.3+cu118torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
Then, run scripts/run_inference.sh
, which receives three input parameters in sequence: MODELNAME
, DATALIST
, and MODE
. MODELNAME
represents the name of the model, DATALIST
represents the datasets used for inference, and MODE
represents evaluation mode:
chmod +x ./scripts/run_inference.sh
./scripts/run_inference.sh $MODELNAME $DATALIST $MODE
The four available choices for MODELNAME
are listed in vlmeval/config.py
:
minicpm_series = {
'MiniCPM-V': partial(MiniCPM_V, model_path='openbmb/MiniCPM-V'),
'MiniCPM-V-2': partial(MiniCPM_V, model_path='openbmb/MiniCPM-V-2'),
'MiniCPM-Llama3-V-2_5': partial(MiniCPM_Llama3_V, model_path='openbmb/MiniCPM-Llama3-V-2_5'),
'MiniCPM-V-2_6': partial(MiniCPM_V_2_6, model_path='openbmb/MiniCPM-V-2_6'),
}
All available choices for DATALIST
are listed in vlmeval/utils/dataset_config.py
. Separate the names of different datasets with spaces and add quotation marks at both ends:
$DATALIST="MMMU_DEV_VAL MathVista_MINI MMVet MMBench_DEV_EN_V11 MMBench_DEV_CN_V11 MMStar HallusionBench AI2D_TEST"
While scoring on each benchmark directly, set MODE=all
. If only inference results are required, set MODE=infer
. In order to reproduce the results in the table displayed on the homepage (columns between MME and HallusionBench), you need to run the script according to the following settings:
# without CoT
./scripts/run_inference.sh MiniCPM-V-2_6 "MMMU_DEV_VAL MathVista_MINI MMVet MMBench_DEV_EN_V11 MMBench_DEV_CN_V11 MMStar HallusionBench AI2D_TEST" all
./scripts/run_inference.sh MiniCPM-V-2_6 MME all
# with CoT
# While running the CoT version of MME, you need to modify the 'use_cot' function in vlmeval/vlm/minicpm_v.py and add MME to the branch that returns True.
./scripts/run_inference/sh MiniCPM-V-2_6 "MMMU_DEV_VAL MMVet MMStar HallusionBench OCRBench" all
./scripts/run_inference.sh MiniCPM-V-2_6 MME all
First, enter the vqaeval
directory and install all dependencies. Then, create downloads
subdirectory to store the downloaded dataset for all tasks:
cd vqaeval
pip install -r requirements.txt
mkdir downloads
Download the datasets from the following links and place it in the specified directories:
cd downloads
mkdir TextVQA && cd TextVQA
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
unzip train_val_images.zip && rm train_val_images.zip
mv train_val_images/train_images . && rm -rf train_val_images
wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json
cd ../..
cd downloads
mkdir DocVQA && cd DocVQA && mkdir spdocvqa_images
# Download Images and Annotations from Task 1 - Single Page Document Visual Question Answering at https://rrc.cvc.uab.es/?ch=17&com=downloads
# Move the spdocvqa_images.tar.gz and spdocvqa_qas.zip to DocVQA directory
tar -zxvf spdocvqa_images.tar.gz -C spdocvqa_images && rm spdocvqa_images.tar.gz
unzip spdocvqa_qas.zip && rm spdocvqa_qas.zip
cp spdocvqa_qas/val_v1.0_withQT.json . && cp spdocvqa_qas/test_v1.0.json . && rm -rf spdocvqa_qas
cd ../..
The downloads
directory should be organized according to the following structure:
downloads
├── TextVQA
│ ├── train_images
│ │ ├── ...
│ ├── TextVQA_0.5.1_val.json
├── DocVQA
│ ├── spdocvqa_images
│ │ ├── ...
│ ├── val_v1.0_withQT.json
│ ├── test_v1.0.json
Modify the parameters in shell/run_inference.sh
and run inference:
chmod +x ./shell/run_inference.sh
./shell/run_inference.sh
All optional parameters are listed in eval_utils/getargs.py
. The meanings of some major parameters are listed as follows.
For MiniCPM-V-2_6
, set model_name
to minicpmv26
:
# path to images and their corresponding questions
# TextVQA
--textVQA_image_dir
--textVQA_ann_path
# DocVQA
--docVQA_image_dir
--docVQA_ann_path
# DocVQATest
--docVQATest_image_dir
--docVQATest_ann_path
# whether to eval on certain task
--eval_textVQA
--eval_docVQA
--eval_docVQATest
--eval_all
# model name and model path
--model_name
--model_path
# load model from ckpt
--ckpt
# the way the model processes input data, "interleave" represents interleaved image-text form, while "old" represents non-interleaved.
--generate_method
--batchsize
# path to save the outputs
--answer_path
While evaluating on different tasks, parameters need to be set as follows:
--eval_textVQA
--textVQA_image_dir ./downloads/TextVQA/train_images
--textVQA_ann_path ./downloads/TextVQA/TextVQA_0.5.1_val.json
--eval_docVQA
--docVQA_image_dir ./downloads/DocVQA/spdocvqa_images
--docVQA_ann_path ./downloads/DocVQA/val_v1.0_withQT.json
--eval_docVQATest
--docVQATest_image_dir ./downloads/DocVQA/spdocvqa_images
--docVQATest_ann_path ./downloads/DocVQA/test_v1.0.json
For the DocVQATest task, in order to upload the inference results to the official website for evaluation, run shell/run_transform.sh
for format transformation after inference. input_file_path
represents the path to the original output json, output_file_path
represents the path to the transformed json:
chmod +x ./shell/run_transform.sh
./shell/run_transform.sh
Expand
First, enter the vlmevalkit
directory and install all dependencies:
cd vlmevalkit
pip install -r requirements.txt
Then, run scripts/run_inference.sh
, which receives three input parameters in sequence: MODELNAME
, DATALIST
, and MODE
. MODELNAME
represents the name of the model, DATALIST
represents the datasets used for inference, and MODE
represents evaluation mode:
chmod +x ./scripts/run_inference.sh
./scripts/run_inference.sh $MODELNAME $DATALIST $MODE
The three available choices for MODELNAME
are listed in vlmeval/config.py
:
ungrouped = {
'MiniCPM-V':partial(MiniCPM_V, model_path='openbmb/MiniCPM-V'),
'MiniCPM-V-2':partial(MiniCPM_V, model_path='openbmb/MiniCPM-V-2'),
'MiniCPM-Llama3-V-2_5':partial(MiniCPM_Llama3_V, model_path='openbmb/MiniCPM-Llama3-V-2_5'),
}
All available choices for DATALIST
are listed in vlmeval/utils/dataset_config.py
. While evaluating on a single dataset, call the dataset name directly without quotation marks; while evaluating on multiple datasets, separate the names of different datasets with spaces and add quotation marks at both ends:
$DATALIST="POPE ScienceQA_TEST ChartQA_TEST"
While scoring on each benchmark directly, set MODE=all
. If only inference results are required, set MODE=infer
. In order to reproduce the results in the table displayed on the homepage (columns between MME and RealWorldQA), you need to run the script according to the following settings:
# run on all 7 datasets
./scripts/run_inference.sh MiniCPM-Llama3-V-2_5 "MME MMBench_TEST_EN MMBench_TEST_CN MMMU_DEV_VAL MathVista_MINI LLaVABench RealWorldQA" all
# The following are instructions for running on a single dataset
# MME
./scripts/run_inference.sh MiniCPM-Llama3-V-2_5 MME all
# MMBench_TEST_EN
./scripts/run_inference.sh MiniCPM-Llama3-V-2_5 MMBench_TEST_EN all
# MMBench_TEST_CN
./scripts/run_inference.sh MiniCPM-Llama3-V-2_5 MMBench_TEST_CN all
# MMMU_DEV_VAL
./scripts/run_inference.sh MiniCPM-Llama3-V-2_5 MMMU_DEV_VAL all
# MathVista_MINI
./scripts/run_inference.sh MiniCPM-Llama3-V-2_5 MathVista_MINI all
# LLaVABench
./scripts/run_inference.sh MiniCPM-Llama3-V-2_5 LLaVABench all
# RealWorldQA
./scripts/run_inference.sh MiniCPM-Llama3-V-2_5 RealWorldQA all
First, enter the vqaeval
directory and install all dependencies. Then, create downloads
subdirectory to store the downloaded dataset for all tasks:
cd vqaeval
pip install -r requirements.txt
mkdir downloads
Download the datasets from the following links and place it in the specified directories:
cd downloads
mkdir TextVQA && cd TextVQA
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
unzip train_val_images.zip && rm train_val_images.zip
mv train_val_images/train_images . && rm -rf train_val_images
wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json
cd ../..
cd downloads
mkdir DocVQA && cd DocVQA && mkdir spdocvqa_images
# Download Images and Annotations from Task 1 - Single Page Document Visual Question Answering at https://rrc.cvc.uab.es/?ch=17&com=downloads
# Move the spdocvqa_images.tar.gz and spdocvqa_qas.zip to DocVQA directory
tar -zxvf spdocvqa_images.tar.gz -C spdocvqa_images && rm spdocvqa_images.tar.gz
unzip spdocvqa_qas.zip && rm spdocvqa_qas.zip
cp spdocvqa_qas/val_v1.0_withQT.json . && cp spdocvqa_qas/test_v1.0.json . && rm -rf spdocvqa_qas
cd ../..
The downloads
directory should be organized according to the following structure:
downloads
├── TextVQA
│ ├── train_images
│ │ ├── ...
│ ├── TextVQA_0.5.1_val.json
├── DocVQA
│ ├── spdocvqa_images
│ │ ├── ...
│ ├── val_v1.0_withQT.json
│ ├── test_v1.0.json
Modify the parameters in shell/run_inference.sh
and run inference:
chmod +x ./shell/run_inference.sh
./shell/run_inference.sh
All optional parameters are listed in eval_utils/getargs.py
. The meanings of some major parameters are listed as follows.
For MiniCPM-Llama3-V-2_5
, set model_name
to minicpmv
:
# path to images and their corresponding questions
# TextVQA
--textVQA_image_dir
--textVQA_ann_path
# DocVQA
--docVQA_image_dir
--docVQA_ann_path
# DocVQATest
--docVQATest_image_dir
--docVQATest_ann_path
# whether to eval on certain task
--eval_textVQA
--eval_docVQA
--eval_docVQATest
--eval_all
# model name and model path
--model_name
--model_path
# load model from ckpt
--ckpt
# the way the model processes input data, "interleave" represents interleaved image-text form, while "old" represents non-interleaved.
--generate_method
--batchsize
# path to save the outputs
--answer_path
While evaluating on different tasks, parameters need to be set as follows:
--eval_textVQA
--textVQA_image_dir ./downloads/TextVQA/train_images
--textVQA_ann_path ./downloads/TextVQA/TextVQA_0.5.1_val.json
--eval_docVQA
--docVQA_image_dir ./downloads/DocVQA/spdocvqa_images
--docVQA_ann_path ./downloads/DocVQA/val_v1.0_withQT.json
--eval_docVQATest
--docVQATest_image_dir ./downloads/DocVQA/spdocvqa_images
--docVQATest_ann_path ./downloads/DocVQA/test_v1.0.json
For the DocVQATest task, in order to upload the inference results to the official website for evaluation, run shell/run_transform.sh
for format transformation after inference. input_file_path
represents the path to the original output json, output_file_path
represents the path to the transformed json:
chmod +x ./shell/run_transform.sh
./shell/run_transform.sh