Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss curve of llava-next-llama3 #12

Open
simplelifetime opened this issue Jul 1, 2024 · 3 comments
Open

loss curve of llava-next-llama3 #12

simplelifetime opened this issue Jul 1, 2024 · 3 comments

Comments

@simplelifetime
Copy link

Thanks for your great work! I'm wondering if u can share the loss curve for training llava-next-llama3? I've observed some different behaviour compared to training llava-next-vicuna-7b. I'm wondering if it's normal or do I make some mistakes during training.

@hkunzhe
Copy link

hkunzhe commented Jul 1, 2024

@simplelifetime Could you share your loss curve with both llava-next-vicuna-7b and llava-next-llama3

@homiec
Copy link

homiec commented Jul 27, 2024

+1, thanks

@mmderakhshani
Copy link

mmderakhshani commented Oct 14, 2024

Hi @simplelifetime

Regarding your question about llama3, I am getting zero loss value in the fine-tuning stage. Did you also get the same loss values?

{'loss': 1.9166, 'learning_rate': 2.0876826722338203e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 4.1753653444676405e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 6.263048016701463e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 8.350730688935281e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.0438413361169103e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.2526096033402926e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.4613778705636743e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.6701461377870562e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.8789144050104384e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 2.0876826722338207e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 2.2964509394572026e-07, 'epoch': 0.0}

Following is my code for fine-tuning:


export BASE_LR=2e-5
export VIT_LR=2e-6
DEVICE_BATCH_SIZE=2
GRADIENT_ACCU_STEPS=2

deepspeed llava/train/train_mem.py \
    --deepspeed ./scripts/zero2.json \
    --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
    --version llava_llama_3 \
    --data_path ${DATA_PATH} \
    --image_folder ${LLaVA_PATH}/data \
    --vision_tower openai/clip-vit-large-patch14-336 \
    --mm_projector_type mlp2x_gelu \
    --pretrain_mm_mlp_adapter ${OUTPUT}/checkpoints/llava-v1.6-8b_llama3-8b_pretrain_lcs-558k_ft-mlp-lr-1e-3/mm_projector.bin \
    --unfreeze_mm_vision_tower True \
    --mm_vision_tower_lr ${VIT_LR} \
    --image_aspect_ratio anyres \
    --group_by_modality_length True \
    --mm_vision_select_layer -2 \
    --mm_vision_select_feature patch \
    --mm_patch_merge_type spatial_unpad \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 True \
    --output_dir ${OUTPUT}/checkpoints/${SAVE_PATH} \
    --num_train_epochs 1 \
    --per_device_train_batch_size ${DEVICE_BATCH_SIZE} \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps ${GRADIENT_ACCU_STEPS} \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 7975 \
    --save_total_limit 1 \
    --learning_rate ${BASE_LR} \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 6144 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to wandb \
    --run_name ${SAVE_PATH}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants