Skip to content

[YAICON24W] Can We Add Personalized Knowledge to LVLMs Naively? : A Preliminary Exploration

License

Notifications You must be signed in to change notification settings

rubato-yeong/YAICON24W-Multimodal

 
 

Repository files navigation

header

🥉 YAICON24W 3rd Prize 🥉


🔍 Project Overview

Introduction

Note this project is not a full-fledged research project and is only a preliminary exploration. Do not use the results of this project as a reference for any research or development.

This is the repository for "Can We Add Personalized Knowledge to LVLMs Naively? : A Preliminary Exploration" in YAICON 2024 Winter. This project is a preliminary exploration of adding personalized knowledge to Large Vision-Language Models (LVLMs) naively (i.e., only w/ fine-tuning).

Preliminary Results

Key Finding: LoRA can add knowledge while preserving the original capabilities of the LVLMs.

output

output1

📌 Qualitative Results: LVLMs indeed demonstrate the ability to learn personalized knowledge without relying on memorization at certain points.

image image

Discussion


🧪 Tutorial

Installation

  1. Install Necessary Packages
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install Additional Packages for Training
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Training

  1. Prepare Training Dataset (json file + image folder)
[
  {
    "id": "000000033471",
    "image": "000000033471.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWhat are the colors of the bus in the image?"
      },
      {
        "from": "gpt",
        "value": "The bus in the image is white and red."
      },
    ]
  }
  ...
]
  1. Train LLaVA (Set --data_path and --image_folder)
bash scripts/v1.5/finetune_task.sh
bash scripts/v1.5/finetune_task_lora.sh

Evaluation

  1. Prepare Evaluation Dataset (jsonl file + image folder)
{"question_id": 0, "image": "kar40.jpg", "text": "Could you provide details about the person in this picture?", "category": "default"}
{"question_id": 1, "image": "kar40.jpg", "text": "Who is the person shown in this image?", "category": "default"}
...
  1. Evaluate LLaVA (Set --model-path, --question-file, --image-folder, and --answers-file)

Tip: For faster evaluation, set --max_new_tokens to a smaller value. (llava/eval/model_vqa_loader.py)

bash scripts/v1.5/eval/***.sh

🐣 Contribution

Team Members

  • Dabin Lee: Leader, Presentation, Data Collection
  • Jinyeong Kim: Model Implementation, Visualization, Presentation
  • Jungsik Yoon: Background Research, Data Collection
  • Chanyong Yoon: Data Collection
  • Sangmin Lee: Data Preprocessing, Model Implementation

Acknowledgement

We sincerely thank the organizers of YAICON 2024 Winter for providing us with the opportunity to present our preliminary exploration. We also extend our gratitude for the excellent LLaVA codebase.

About

[YAICON24W] Can We Add Personalized Knowledge to LVLMs Naively? : A Preliminary Exploration

Resources

License

Stars

Watchers

Forks

Languages

  • Python 85.7%
  • Shell 8.5%
  • JavaScript 2.8%
  • HTML 2.1%
  • Other 0.9%