Skip to content

Z1zs/MMNeuron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

1The Hong Kong University of Science and Technology (Guangzhou) 
2The Hong Kong University of Science and Technology  3Tongji University 
Corresponding Author

Official implementation of "MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model". Our codes are borrowed from Tang's language specific neurons implementation here and nrimsky's logit lens implementation here. Thanks a lot for their efforts!

Updates

  • 17 June, 2024 :Paper published in Arxiv.
  • 17 June, 2024 : Code published.
  • 20 Steptember, 2024 : Paper accepted by EMNLP main conference!

This repository contains the official implementation of the following paper:

MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model https://arxiv.org/abs/2406.11193

Abstract: Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechanism of how MLLMs process features from diverse domains. Furthermore, we propose a three-stage framework for language model modules in MLLMs when handling projected image features, and verify this hypothesis using logit lens. Extensive experiments indicate that while current MLLMs exhibit Visual Question Answering (VQA) capability, they may not fully utilize domain-specific information. Manipulating domain-specific neurons properly will result in a 10% change of accuracy at most, shedding light on the development of cross-domain, all-encompassing MLLMs in the future. Our code will be released upon paper notification.

Todo

  1. Release the code.

Get Start

Install

conda create -n mmneuron python=3.10
conda activate mmneuron
pip install -r requirements.txt

Dataset Preparation

Download the following datasets and put them in directory named "benchs".

LingoQA

Auto Driving Domain. You can get the dataset here.

VQAv2

Common Life. You can get the dataset here.

DocVQA

Document VQA. You can get the dataset here (or huggingface link here).

PMC-VQA

Medical VQA. You can get the dataset here.

RS-VQA(HS)

Remote sensing VQA. You can get the dataset here.

The final directory should look like:
├── ad
│   ├── images
│   ├── images.zip
│   ├── train.parquet
│   └── val
├── med
│   ├── figures
│   ├── images
│   ├── test_2.csv
│   ├── test_clean.csv
│   ├── test.csv
│   ├── train_2.csv
│   └── train.csv
├── rs
│   ├── Data
│   ├── USGSanswers.json
│   ├── USGSimages.json
│   ├── USGSquestions.json
│   ├── USGS_split_test_answers.json
│   ├── USGS_split_test_questions.json
│   ├── USGS_split_val_answers.json
│   ├── USGS_split_val_images.json
│   └── USGS_split_val_questions.json
└── vqav2
  ├── data
  ├── README.md
  ├── train2014
  ├── v2_mscoco_train2014_annotations.json
  ├── v2_mscoco_val2014_annotations.json
  ├── v2_mscoco_val2014_ansdict.json
  ├── v2_OpenEnded_mscoco_train2014_questions.json
  ├── v2_OpenEnded_mscoco_val2014_questions.json
  └── val2014

Neuron Activation

After preparing the data, you can record the activation probability of neurons in LLaVA-Next and InstructBLIP by running the python script:

python activation.py -m llava
python activation.py -m blip

To simplify, we use the LLaVA-Next and InstructBLIP's huggingface API here, you can also find their official implementation in URL1 and URL2.

Domain-specific Neuron Identification

After getting the activation probability, you can identifying the domain-specific neurons in LLaVA-Next or InstructBLIP's language model by running the python script:

python identify.py -m llava -d lang

For LLaVA-Next, the options of '-d' are ['lang','vision','mmproj'].
For InstructBLIP, the options of '-d' are ['lang','encoder','qformer','query'].

Generation

Generation response to VQA datasets by running command:

python generate.py -m llava

Evaluation

The evaluation code can be found in evaluate.py. The input file should contain three elements: ground_truth, answer and index. Evaluate model performance by running the command:

python evaluate.py -m llava

Logit Lens

Run the command

python logit_len.py -m llava

to investigate the hidden states of intermedia layers.

Cite

@article{huo2024mmneuron,
  title={MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model},
  author={Huo, Jiahao and Yan, Yibo and Hu, Boren and Yue, Yutao and Hu, Xuming},
  journal={arXiv preprint arXiv:2406.11193},
  year={2024}
}

Star History

Star History Chart

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages