This repository contains the source code for the CVPR 2025 paper: "Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception". 📄✨
We’ve built a benchmark to evaluate the robustness of recent advanced Large Vision-Language Models (LVLMs) against Counterfactual Presupposition Questions (CPQs). This benchmark assesses:
- Rejection of Counterfactual Presupposition (CP): Whether the model can explicitly reject the counterfactual presupposition in CPQs.
- Acceptance of True Presupposition (TP): Whether the model can accept the true presupposition in True Presupposition Questions (TPQs) and provide appropriate answers.
Here are some key findings:
- Current LVLMs struggle to balance rejecting CPs and accepting TPs.
- Earlier LVLMs (e.g., LLaVA-1.5-7B) have weak CP rejection capabilities, often blindly answering questions, leading to significant hallucinations.
- Recent LVLMs show improved capability of CP rejection, but some LVLMs (e.g., Qwen-2.5-VL-72B) appear overly cautious.
- Models with reasoning capabilities (e.g., Qwen-2-VL-72B and QvQ-72B) demonstrate better CP rejection.
- Within the same model series, scaling the LLM foundations can improve CP and TP judgment (e.g., InternVL-2 series: 8B vs. 70B; Qwen2.5-VL: 7B vs. 72B).
Due to time and resource constraints, we haven’t extensively tested recent proprietary LVLMs yet. We’ll update this section in the future. 🚧
In this repository, we’ve structured our Antidote framework into four stages:
Data Pipeline (data_pipeline) — Inference (inference) — Preference Alignment (preference_alignment)
The detailed instructions are provided in each step folder.
NOTES:
- We refactored the code using SWIFT framework (https://github.com/modelscope/ms-swift) for better compatibility with existing LVLMs.
- Please see the document of SWIFT for training implementations.
Please kindly cite our paper if you find it's helpful in your work:
@article{wu2025antidote,
title={Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception},
author={Wu, Yuanchen and Zhang, Lu and Yao, Hang and Du, Junlong and Yan, Ke and Ding, Shouhong and Wu, Yunsheng and Li, Xiaoqiang},
journal={arXiv preprint arXiv:2504.20468},
year={2025}
}
We gratefully acknowledge the significant contributions of the open-source community and specifically thank the developers and contributors of MS-Swift, Grounding-DINO, and Stable-Diffusion 3. Thanks for their brilliant works!

