Skip to content

Experimental environment questions for each model #1

@1st-hoan

Description

@1st-hoan

Hello. Thank you for sharing your amazing research.
I have a question about train environment.

  1. I am currently training with 50 epochs using gradient accumulation step 4 in an environment equipped with two RTX 4090 GPUs. However, I am unable to reproduce the performance as shown in the attached log.txt file for DeiT small < DeiT base. Which part of the environment is wrong?
  2. For this reason, I would like to ask about the epoch, learning rate, batch size per GPU, and number of GPUs used for each model.

The command is as follows:

python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py \
  --model deit_small_patch16_224 \
  --teacher_model deit_base \
  --epochs 50 \
  --batch-size 128 \
  --data-path /Dataset_classification/imagenet/ \
  --distillation-type soft \
  --distillation-alpha 0.5 \
  --distillation-tau 1 \
  --input-size 224 \
  --maskedkd --len_num_keep 98 \
  --accum-steps 4 \
  --output_dir /MaskedKD/work_dir/2_node_accumul4/

log.txt is as follows:
log.txt

Thank you!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions