Skip to content

Failed to reproduce LongVA-7B after training from scratch #37

@nanocm

Description

@nanocm

I tried to reproduce the model. Below are the steps I followed:

  1. pretrain
    First, I ran the scripts/pretrain.sh, which producs the projector. The pretrain data comes from https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain. And I added some lines in model/llava_arch.py(prepare_inputs_labels_for_multimodal), because of the incorrect input dimension when I directly use liuhaotian/LLaVA-Pretrain. Specifically, I unsqueeze the image tensor to match the requested "5-dimension input" and use a batch size of 1 in case the modifications result in unwanted errors.
  2. finetune
    Then, I executed the scripts/fintune.sh, using the projector from step 1 and Qwen-224k LLM from the huggingface. https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data is the dataset I use.

Then, I obtained the “LongVA-7B”, I think.(I didn't run dpo.sh)
However, the test results are much different from those in the paper(may due to lmms-eval) and from the released chekpoints from hf.
image

I noticed there are some private data in LLaVA-NeXT-Data, which was mentioned in #10 and the hf datasets repo .
Is it because the private data used during training that accounts for the difference?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions