Awesome LVLM Hallucination

All the papers listed in this project come from my usual reading. If you have found some new and interesting papers, I would appreciate it if you let me know!!!

survey:

A Survey on Hallucination in Large Vision-Language Models

Hallucination of Multimodal Large Language Models: A Survey

Hallucination Benchmarks:

VideoHallucer VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models (Jun. 24, 2024)
MOCHa (OpenCHAIR) MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations (Dec. 06, 2023)
CCEval HallE-Switch: Controlling Object Hallucination in Large Vision Language Models (Dec. 03, 2023)
HallusionBench HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models (Nov. 28, 2023)Highly recommended
HaELM Evaluation and Analysis of Hallucination in Large Vision-Language Models (Oct. 10, 2023)
NOPE Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models (Oct. 9, 2023)
LRV (GAVIE) Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning (Sep., 29 2023)
MMHal-Bench Aligning Large Multimodal Models with Factually Augmented RLHF (Sep. 25, 2023)
POPE Evaluating Object Hallucination in Large Vision-Language Models (EMNLP 2023)（object hallucination最常用的benchamark）**Highly recommended

**

CHAIR Object Hallucination in Image Captioning (EMNLP 2018)
VHtestVisual Hallucinations of Multi-modal Large Language Models
Hal-EvalHal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models
PhDPhD: A Prompted Visual Hallucination Evaluation DatasetHighly recommended
THRONE THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
MetaToken MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification

Mitigating（LVLM）

LRV-Instruction: Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning, (Liu et al. ICLR2024)
- [dataset] propose an instruction-tuning dataset that includes both positive and negative sample
- GAIVE: evaluation approach which uses GPT-4
LURE: Analyzing and Mitigating Object Hallucination in Large Vision-Language Models, (Zhou et al. ICLR2024)
- [post-hoc revision] train a revision model to detect and correct hallucinated objects in the base model’s response.
HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision-Language Models for Detailed Caption, (Zhai et al. 2023)
- CCEval, a GPT-4 assisted evaluation method tailored for detailed captioning
Woodpecker: Hallucination Correction for Multimodal Large Language Models, (Yin et al.)
- [revision] post-hoc correction
- need other pre-trained visual models
LLaVA-RLHF: Aligning Large Multimodal Models with Factually Augmented RLHF, (Sun et al.)
- [RLHF-PPO] the first LMM trained with RLHF
- propose benchmark: MMHal-Bench
Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision, (Lee et al.)
- self-feedback, according to self-generate natural language feedback to self-revise response
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data， (Yu et al.)
VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding, (Leng et al.)Highly recommended
- constractive decoding
HA-DPO: Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
Mitigating Hallucination in Visual Language Models with Visual Supervision, (Chen et al.)
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation, (Huang et al.)Highly recommended
- Improve beam search
FOHE: Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites, (Wang et al.)
- use ChatGPT to post-hoc correction
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
- [RLHF-DPO] 1.4K preference data, natural language feedback
MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations, (Ben-Kish et al.)
- [RLHF]
HACL: Hallucination Augmented Contrastive Learning for Multimodal Large Language Model, (Jiang et al.)
Silkie: Preference Distillation for Large Visual Language Models, (Li et al.)
HalluciDoctor:HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data：用“反常规”的数据干掉假相关性 -
MARINE: Mitigating Object Hallucination in LargeVision-Language Models via Classifier-Free Guidance
CIEM ：CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
EFUF: EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
SEEING IS BELIEVING：MITIGATING HALLUCINATION IN LARGE VISION-LANGUAGE MODELS VIA CLIP-GUIDED DECODING
Logical Closed Loop： Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning -
MOF：Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs 三幻神之作，强烈推荐！！！（这篇paper“养”活了好几篇paper.....）
IBD: IBD: Alleviating Hallucinations in Large Vision-Language Models viaImage-Biased Decoding
DualFocus:DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models
HALC：HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
less is more:Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective
Number Hallucinations：Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective
- 30-. Debiasing Large Visual Language Models 跟我想到的方法一模一样.....purshow最小丑的一集.....
The First to Know: The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
Mitigating Dialogue Hallucination for Large Multi-modal Models via Adversarial Instruction Tuning
Pensieve:Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination
Multi-Modal Hallucination Control by Visual Information Grounding
What if...?:What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding -
H2RSVLM:H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model
FGAIF:FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models
BRAVE:BRAVE: Broadening the visual encoding of vision-language models -
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
LION:LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Prescribing the Right Remedy: Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning
Fact:Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales
Self-Supervised Visual Preference Alignment
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
VALOR-EVAL: VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs
Mitigating Object Hallucination via Data Augmented Contrastive Tuning *
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions *
Automated Multi-level Preference for MLLMs
Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization -
VDGD VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap * -
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Don't Miss the Forest for the Trees Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
Calibrated Self-Rewarding Vision Language Models *
VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation

他山之石：

survey（LLM's hallucination）:

Song in the AI Ocean:A Survey on Hallucination in Large Language Models https://github.com/HillZhang1999/llm-hallucination-survey
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
A Comprehensive Survey of Hallucination Mitigation Techniques in LargeLanguage Models
A Survey of Hallucination in “Large” Foundation Models

Interesting topic:

解码层面：

ItI：Inference-Time Intervention:Eliciting Truthful Answers from a Language Model

CDS： Collaborative decoding of critical tokens for boosting factuality of large language models

Self-Consistent Decoding for More Factual Open Responses

Contrastive decoding： (vcd同款方法) I really think there are too many papers in this method....

Contrastive Decoding: Open-ended Text Generation as Optimization Alleviating
Alleviating Hallucinations of Large Language Models through Induced Halluctions
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Discerning and Resolving Knowledge Conflicts through Adaptive Decoding with Contextual Information-Entropy Constraint
ROSE Doesn’t Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding
Weak-to-Strong Jailbreaking on Large Language Models
IBD: Alleviating Hallucinations in Large Vision-Language Models viaImage-Biased Decoding
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
......

Acknowledgments

I am immensely grateful to two pivotal projects that have significantly influenced the development of my work: awesome-Large-MultiModal-Hallucination and Awesome-MLLM-Hallucination. The dedication and effort put forth by the contributors of these projects, particularly xieyuquanxx and the team at Show Lab, have provided an indispensable resource for researchers and developers alike. The awesome-Large-MultiModal-Hallucination project has offered a comprehensive guide and a curated list of resources that have been instrumental in shaping my understanding of Large MultiModal Hallucination. Similarly, the Awesome-MLLM-Hallucination repository has been a treasure trove of knowledge, showcasing cutting-edge techniques and methodologies in the realm of Machine Learning and Large Model Hallucination. By sharing their expertise and compiling these resources, they have not only advanced the field but also fostered a spirit of collaboration and open knowledge. I am deeply appreciative of their contributions and am inspired by their commitment to the community. Their work serves as a foundation upon which I have built and expanded, and for that, I extend my heartfelt thanks. This acknowledgment is a small gesture compared to the vast impact their work has had on mine. Thank you for setting a remarkable example for the open-source and scientific communities.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome LVLM Hallucination

survey:

Hallucination Benchmarks:

Mitigating（LVLM）

他山之石：

survey（LLM's hallucination）:

Interesting topic:

解码层面：

ItI：Inference-Time Intervention:Eliciting Truthful Answers from a Language Model

CDS： Collaborative decoding of critical tokens for boosting factuality of large language models

Self-Consistent Decoding for More Factual Open Responses

Contrastive decoding： (vcd同款方法) I really think there are too many papers in this method....

Acknowledgments

About

Releases

Packages

Contributors 2

Purshow/Awesome-LVLM-Hallucination

Folders and files

Latest commit

History

Repository files navigation

Awesome LVLM Hallucination

survey:

Hallucination Benchmarks:

Mitigating（LVLM）

他山之石：

survey（LLM's hallucination）:

Interesting topic:

解码层面：

ItI：Inference-Time Intervention:Eliciting Truthful Answers from a Language Model

CDS： Collaborative decoding of critical tokens for boosting factuality of large language models

Self-Consistent Decoding for More Factual Open Responses

Contrastive decoding： (vcd同款方法) I really think there are too many papers in this method....

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages