-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hello. Thank you for sharing your amazing research.
I have a question about train environment.
- I am currently training with 50 epochs using gradient accumulation step 4 in an environment equipped with two RTX 4090 GPUs. However, I am unable to reproduce the performance as shown in the attached log.txt file for DeiT small < DeiT base. Which part of the environment is wrong?
- For this reason, I would like to ask about the epoch, learning rate, batch size per GPU, and number of GPUs used for each model.
The command is as follows:
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py \
--model deit_small_patch16_224 \
--teacher_model deit_base \
--epochs 50 \
--batch-size 128 \
--data-path /Dataset_classification/imagenet/ \
--distillation-type soft \
--distillation-alpha 0.5 \
--distillation-tau 1 \
--input-size 224 \
--maskedkd --len_num_keep 98 \
--accum-steps 4 \
--output_dir /MaskedKD/work_dir/2_node_accumul4/
log.txt is as follows:
log.txt
Thank you!
Metadata
Metadata
Assignees
Labels
No labels