Code for the Paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?".
For more details, please refer to the project page with dataset exploration and visualization tools: https://turningpoint-ai.github.io/MOSSBench/.
🔔 If you have any questions or suggestions, please don't hesitate to let us know. You can comment on the Twitter, or post an issue on this repository.
[Webpage] [Paper] [Huggingface Dataset] [Visualization] [Result Explorer] [Twitter]
Logo for MOSSBench generated by DALL·E 3.
- 💥 News 💥
- 👀 About MOSSBench
- 🏆 Leaderboard 🏆
- Contributing the Leaderboard
- 📊 Dataset Examples
- 📖 Dataset Usage
- 🔮 Evaluations on MOSSBench
- 📜 License
- ☕ Stay Connected!
- ✅ Cite
- [2024.06.22] Our paper is now accessible at ArXiv.
Humans are prone to cognitive distortions — biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced MLLMs exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of certain visual stimuli, disregarding the benign nature of their contexts.
Overview of MOSSBench. MLLMs exhibit behaviors similar to human cognitive distortions, leading to oversensitive responses where benign queries are perceived as harmful. We discover that oversensitivity prevails among existing MLLMs.
As the initial step in investigating this behavior, we identify three types of stimulus that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation. To systematically evaluate MLLMs' oversensitivity to these stimuli, we propose the Multimodal OverSenSitivity Benchmark Logo (MOSSBench). This toolkit consists of 300 manually collected benign multimodal queries, cross-verified by third-party reviewers (AMT).
Three types of stimuli in MOSSBench.
Empirical studies using Logo MOSSBench on 20 MLLMs reveal several insights: (1). Oversensitivity is prevalent among SOTA MLLMs, with refusal rates reaching up to 76% for harmless queries. (2). Safer models are more oversensitive: increasing safety may inadvertently raise caution and conservatism in the model's responses. (3). Different types of stimuli tend to cause errors at specific stages — perception, intent reasoning, and safety decision-making — in the response process of MLLMs. These findings highlight the need for refined safety mechanisms that balance caution with contextually appropriate responses, improving the reliability of MLLMs in real-world applications.
For more details, you can find our project page here and our paper here.
🚨🚨 The leaderboard is continuously being updated.
The evaluation instructions are available at 🔮 Evaluations on MOSSBench and 📝 Evaluation Scripts of Our Models.
To submit your results to the leaderboard, please send to this email with your result file (we will generate the score file for you), referring to the template file below:
Refusal Rate of mllms:
# | Model | Availability | Date | ALL | Exaggerated Risk | Negated Harm | Counterintuitive Interpretation |
---|---|---|---|---|---|---|---|
1 | Claude 3 Opus (web) | Proprietary MLLMs - Web version | 2024-06-22 | 70.67 | 41 | 93 | 78 |
2 | Gemini Advanced | Proprietary MLLMs - Web version | 2024-06-22 | 61 | 41 | 67 | 75 |
3 | Claude 3 Sonnet | Proprietary MLLMs | 2024-06-22 | 55 | 39 | 65 | 61 |
4 | Claude 3 Haiku | Proprietary MLLMs | 2024-06-22 | 49.33 | 27 | 58 | 63 |
5 | Claude 3 Opus | Proprietary MLLMs | 2024-06-22 | 34.67 | 11 | 43 | 55 |
6 | Gemini Pro 1.5 | Proprietary MLLMs | 2024-06-22 | 29.33 | 25 | 28 | 35 |
7 | Qwen-VL-Chat | Open-source MLLMs | 2024-06-22 | 21.67 | 16 | 13 | 36 |
8 | InternLM-Xcomposer2-7b | Open-source MLLMs | 2024-06-22 | 17.67 | 14 | 11 | 28 |
9 | Gemini Pro Vision | Proprietary MLLMs | 2024-06-22 | 17 | 20 | 9 | 22 |
10 | Reka | Proprietary MLLMs | 2024-06-22 | 16.67 | 11 | 21 | 18 |
11 | InstructBLIP-Vicuna-7b | Open-source MLLMs | 2024-06-22 | 15.67 | 21 | 23 | 3 |
12 | IDEFICS-9b-Instruct | Open-source MLLMs | 2024-06-22 | 13.67 | 17 | 9 | 15 |
13 | MiniCPM-V 2.0 | Open-source MLLMs | 2024-06-22 | 12.33 | 16 | 11 | 10 |
14 | LlaVA-1.5-7b | Open-source MLLMs | 2024-06-22 | 12.33 | 18 | 10 | 9 |
15 | mPLUG-Owl2 | Open-source MLLMs | 2024-06-22 | 10 | 11 | 7 | 12 |
16 | LlaVA-1.5-13b | Open-source MLLMs | 2024-06-22 | 9.67 | 9 | 9 | 11 |
17 | GPT-4o | Proprietary MLLMs | 2024-06-22 | 6.33 | 6 | 8 | 5 |
18 | MiniCPM-Llama3-V 2.5 | Open-source MLLMs | 2024-06-22 | 6 | 8 | 5 | 5 |
19 | GPT-4o | Proprietary MLLMs - Web version | 2024-06-22 | 4 | 6 | 2 | 4 |
Examples of 3 types of oversensitivity stimuli:
- Exaggerated Risk
- Negated Harm
- Counterintuitive Interpretation
You can download this dataset by the following command (make sure that you have installed Huggingface Datasets):
from datasets import load_dataset
dataset = load_dataset("AIcell/MOSSBench", "oversensitivity")
Here are some examples of how to access the downloaded dataset:
# print the first example on the testmini set
print(dataset["train"][0])
print(dataset["train"][0]['pid']) # print the problem id
print(dataset["train"][0]['question']) # print the question text
print(dataset["train"][0]['image']) # print the image path
dataset["train"][0]['decoded_image'] # display the image
The dataset is provided in json format and contains the following attributes:
{
"image": [string] A file path pointing to the associated image,
"short description": [string] An oracle short description of the associated image,
"question": [string] A query regarding to the image,
"pid": [string] Problem ID, e.g., "1",
"metadata": {
"over": [string] Oversensitivity type,
"human": [integer] Whether image contains human, e.g. 0 or 1,
"child": [integer] Whether image contains child, e.g. 0 or 1,
"syn": [integer] Whether image is synthesized, e.g. 0 or 1,
"ocr": [integer] Whether image contains ocr, e.g. 0 or 1,
"harm": [integer] Which harm type the query belongs to, 0-7,
}
}
🎰 You can explore the dataset in an interactive way here.
Install the Python dependencies if you would like to reproduce our results for ChatGPT, GPT-4, Claude-2, and Bard:
pip install -r requirements.txt
Get your models API ready in following links
and store them under foler path_to_your_code/api_keys/[model].text
. Please replace the [model]
by anthropic_keys
, google_keys
and openai_keys
.
Download your model or get their names for Huggingface. And replace the following path by where you locate your models or your models name.
# Initialize variables
MODEL_NAME="your_path_to/idefics-9b-instruct" # you can replace it by direct naming
DATA_DIR=""
Step 2. Run evaluation (main.py)
Next, run experiments/main.py
file in folder or excute the .sh
files we provide for evaluation by
cd experiments/scripts
bash run_instructblip.sh
The new contributions to our dataset are distributed under the CC BY-SA 4.0 license, including
-
The creation of contrasting and oversensitivity dataset: IQTest, FunctionQA, and Paper;
-
The filtering and cleaning of source datasets;
-
The standard formalization of instances for evaluation purposes;
-
The annotations of metadata.
-
Purpose: The dataset was primarily designed for use as a test set.
-
Commercial Use: The dataset can be used commercially as a test set, but using it as a training set is prohibited. By accessing or using this dataset, you acknowledge and agree to abide by these terms in conjunction with the CC BY-SA 4.0 license.
We are always open to engaging discussions, collaborations, or even just sharing a virtual coffee. To get in touch or join our team, visit TurningPoint AI's homepage for contact information.
If you find MOSSBench useful for your your research and applications, please kindly cite using this BibTeX:
@misc{li2024mossbenchmultimodallanguagemodel,
title={MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?},
author={Xirui Li and Hengguang Zhou and Ruochen Wang and Tianyi Zhou and Minhao Cheng and Cho-Jui Hsieh},
year={2024},
eprint={2406.17806},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.17806},
}
MOSSBench website is adapted from Nerfies website and MathVista website.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.