This is the official repo of our work "Pensieve", a training free method to mitigate visual hallucination in Multi-modal LLMs.
- [2024-04-??]: 🧑🏻💻👩🏼💻 Code Release.
- [2024-03-21]: 🎉🎉 Our Paper is available on Arxiv.
- We introduce Pensieve, a plug-and-play and training-free method to mitigate visual hallucination and enhance the specificity of image descriptions.
- Install pycocoevalcap for image captioning evaluation.
- Prepare the FaithScore for visual hallucination evaluation
Our project is built upon VCD. We sincerely acknowledge the great contribution of the following works:
- VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
- DOLA: Decoding by Contrasting Layers Improves Factuality in Large Language Models
- FaishScore: Evaluating Hallucinations in Large Vision-Language Models
- LLaVA-1.5: Improved Baselines with Visual Instruction Tuning
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
If you find our project useful, please consider citing our paper:
@article{yang2024pensieve,
title={Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination},
author={Yang, Dingchen and Cao, Bowen and Chen, Guang and Jiang, Changjun},
journal={arXiv preprint arXiv:2403.14401},
year={2024}
}