Note this project is not a full-fledged research project and is only a preliminary exploration. Do not use the results of this project as a reference for any research or development.
This is the repository for "Can We Add Personalized Knowledge to LVLMs Naively? : A Preliminary Exploration" in YAICON 2024 Winter. This project is a preliminary exploration of adding personalized knowledge to Large Vision-Language Models (LVLMs) naively (i.e., only w/ fine-tuning).
✨ Key Finding: LoRA can add knowledge while preserving the original capabilities of the LVLMs.
📌 Qualitative Results: LVLMs indeed demonstrate the ability to learn personalized knowledge without relying on memorization at certain points.
- Related with LoRA Learns Less and Forgets Less
- Limitations: Unefficient training scheme / Only applied to a small-scale dataset
- Install Necessary Packages
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip # enable PEP 660 support
pip install -e .
- Install Additional Packages for Training
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
- Prepare Training Dataset (json file + image folder)
[
{
"id": "000000033471",
"image": "000000033471.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nWhat are the colors of the bus in the image?"
},
{
"from": "gpt",
"value": "The bus in the image is white and red."
},
]
}
...
]
- Train LLaVA (Set
--data_path
and--image_folder
)
bash scripts/v1.5/finetune_task.sh
bash scripts/v1.5/finetune_task_lora.sh
- Prepare Evaluation Dataset (jsonl file + image folder)
{"question_id": 0, "image": "kar40.jpg", "text": "Could you provide details about the person in this picture?", "category": "default"}
{"question_id": 1, "image": "kar40.jpg", "text": "Who is the person shown in this image?", "category": "default"}
...
- Evaluate LLaVA (Set
--model-path
,--question-file
,--image-folder
, and--answers-file
)
Tip: For faster evaluation, set
--max_new_tokens
to a smaller value. (llava/eval/model_vqa_loader.py)
bash scripts/v1.5/eval/***.sh
- Dabin Lee: Leader, Presentation, Data Collection
- Jinyeong Kim: Model Implementation, Visualization, Presentation
- Jungsik Yoon: Background Research, Data Collection
- Chanyong Yoon: Data Collection
- Sangmin Lee: Data Preprocessing, Model Implementation
We sincerely thank the organizers of YAICON 2024 Winter for providing us with the opportunity to present our preliminary exploration. We also extend our gratitude for the excellent LLaVA codebase.