Open
Description
Hi~, great work.
I tried to reproduce LISA's ciou results on the refcoco dataset, but the test results showed that both the 13B v1 and v0 models (LISA-13B-llama2-v1, LISA-13B-llama2-v0) could not reach the performance of the 7B model reported in the paper.
For example, for the results test with LISA-13B-llama2-v0 model on refcoco (In fact, I also tested the 7B model, and the accuracy is even worse):
testA ciou 75.00
testB ciou 68.52
val ciou 72.22
But the 7B model in the paper achieves:
testA ciou 76.5
testB ciou 71.1
val ciou 74.1
Is this normal?
here is the command:
CUDA_VISIBLE_DEVICES=0 deepspeed train_ds.py --eval_only
here is the config:
def parse_args(args):
parser = argparse.ArgumentParser(description="LISA Model Training")
parser.add_argument("--local_rank", default=0, type=int, help="node rank")
parser.add_argument(
"--version", default="/home/data1/fzd/LLM/LISA-13B-v0"
)
parser.add_argument("--vis_save_path", default="./vis_output", type=str)
parser.add_argument(
"--precision",
default="bf16",
type=str,
choices=["fp32", "bf16", "fp16"],
help="precision for inference",
)
parser.add_argument("--image_size", default=1024, type=int, help="image size")
parser.add_argument("--model_max_length", default=512, type=int)
parser.add_argument("--lora_r", default=8, type=int)
parser.add_argument(
"--vision-tower", default="/home/data1/fzd/LLM/openai-clip-vit-large-patch14", type=str
)
parser.add_argument("--load_in_8bit", action="store_true", default=False)
parser.add_argument("--load_in_4bit", action="store_true", default=False)
parser.add_argument(
"--dataset", default="sem_seg||refer_seg||vqa||reason_seg", type=str
)
parser.add_argument("--sample_rates", default="9,3,3,1", type=str)
parser.add_argument(
"--sem_seg_data",
default="ade20k||cocostuff||pascal_part||paco_lvis||mapillary",
type=str,
)
parser.add_argument(
"--refer_seg_data", default="refclef||refcoco||refcoco+||refcocog", type=str
)
parser.add_argument("--vqa_data", default="llava_instruct_150k", type=str)
parser.add_argument("--reason_seg_data", default="ReasonSeg|train", type=str)
parser.add_argument("--val_dataset", default="refcoco|unc|testB", type=str)
parser.add_argument("--dataset_dir", default="/home/data1/fzd/LLM/SEG/dataset/refer_seg", type=str)
parser.add_argument("--log_base_dir", default="./runs", type=str)
parser.add_argument("--exp_name", default="lisa", type=str)
parser.add_argument("--epochs", default=10, type=int)
parser.add_argument("--steps_per_epoch", default=500, type=int)
parser.add_argument(
"--batch_size", default=1, type=int, help="batch size per device per step"
)
parser.add_argument(
"--grad_accumulation_steps",
default=10,
type=int,
)
parser.add_argument("--val_batch_size", default=1, type=int)
parser.add_argument("--workers", default=4, type=int)
parser.add_argument("--lr", default=0.0003, type=float)
parser.add_argument("--ce_loss_weight", default=1.0, type=float)
parser.add_argument("--dice_loss_weight", default=0.5, type=float)
parser.add_argument("--bce_loss_weight", default=2.0, type=float)
parser.add_argument("--lora_alpha", default=16, type=int)
parser.add_argument("--lora_dropout", default=0.05, type=float)
parser.add_argument("--lora_target_modules", default="q_proj,v_proj", type=str)
parser.add_argument("--explanatory", default=0.1, type=float)
parser.add_argument("--beta1", default=0.9, type=float)
parser.add_argument("--beta2", default=0.95, type=float)
parser.add_argument("--num_classes_per_sample", default=3, type=int)
parser.add_argument("--exclude_val", action="store_true", default=False)
parser.add_argument("--no_eval", action="store_true", default=False)
parser.add_argument("--eval_only", action="store_true", default=False)
parser.add_argument("--vision_pretrained", default=None, type=str)
parser.add_argument("--weight", default="", type=str)
parser.add_argument("--print_freq", default=1, type=int)
parser.add_argument("--start_epoch", default=0, type=int)
return parser.parse_args(args)
Thanks.
Metadata
Assignees
Labels
No labels