🔍 Project Overview

🥉 YAICON24W 3rd Prize 🥉

🔍 Project Overview

Introduction

Note this project is not a full-fledged research project and is only a preliminary exploration. Do not use the results of this project as a reference for any research or development.

This is the repository for "Can We Add Personalized Knowledge to LVLMs Naively? : A Preliminary Exploration" in YAICON 2024 Winter. This project is a preliminary exploration of adding personalized knowledge to Large Vision-Language Models (LVLMs) naively (i.e., only w/ fine-tuning).

Preliminary Results

✨ Key Finding: LoRA can add knowledge while preserving the original capabilities of the LVLMs.

📌 Qualitative Results: LVLMs indeed demonstrate the ability to learn personalized knowledge without relying on memorization at certain points.

Discussion

Related with LoRA Learns Less and Forgets Less
Limitations: Unefficient training scheme / Only applied to a small-scale dataset

🧪 Tutorial

Installation

Install Necessary Packages

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install Additional Packages for Training

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Training

Prepare Training Dataset (json file + image folder)

[
  {
    "id": "000000033471",
    "image": "000000033471.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWhat are the colors of the bus in the image?"
      },
      {
        "from": "gpt",
        "value": "The bus in the image is white and red."
      },
    ]
  }
  ...
]

Train LLaVA (Set --data_path and --image_folder)

bash scripts/v1.5/finetune_task.sh
bash scripts/v1.5/finetune_task_lora.sh

Evaluation

Prepare Evaluation Dataset (jsonl file + image folder)

{"question_id": 0, "image": "kar40.jpg", "text": "Could you provide details about the person in this picture?", "category": "default"}
{"question_id": 1, "image": "kar40.jpg", "text": "Who is the person shown in this image?", "category": "default"}
...

Evaluate LLaVA (Set --model-path, --question-file, --image-folder, and --answers-file)

Tip: For faster evaluation, set --max_new_tokens to a smaller value. (llava/eval/model_vqa_loader.py)

bash scripts/v1.5/eval/***.sh

🐣 Contribution

Team Members

Dabin Lee: Leader, Presentation, Data Collection
Jinyeong Kim: Model Implementation, Visualization, Presentation
Jungsik Yoon: Background Research, Data Collection
Chanyong Yoon: Data Collection
Sangmin Lee: Data Preprocessing, Model Implementation

Acknowledgement

We sincerely thank the organizers of YAICON 2024 Winter for providing us with the opportunity to present our preliminary exploration. We also extend our gratitude for the excellent LLaVA codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 461 Commits
.devcontainer		.devcontainer
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
docs		docs
images		images
llava		llava
playground/data		playground/data
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🥉 YAICON24W 3rd Prize 🥉

🔍 Project Overview

Introduction

Preliminary Results

Discussion

🧪 Tutorial

Installation

Training

Evaluation

🐣 Contribution

Team Members

Acknowledgement

About

Languages

License

rubato-yeong/YAICON24W-Multimodal

Folders and files

Latest commit

History

Repository files navigation

🥉 YAICON24W 3rd Prize 🥉

🔍 Project Overview

Introduction

Preliminary Results

Discussion

🧪 Tutorial

Installation

Training

Evaluation

🐣 Contribution

Team Members

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages