Experimental environment questions for each model

Hello. Thank you for sharing your amazing research.
I have a question about train environment.

1. I am currently training with **50 epochs** using **gradient accumulation step 4** in an environment equipped with **two RTX 4090 GPUs**. However, I am unable to reproduce the performance as shown in the attached log.txt file for **DeiT small < DeiT base**. Which part of the environment is wrong?
2. For this reason, I would like to ask about the **epoch**, **learning rate**, **batch size per GPU**, and **number of GPUs** used for each model.

The command is as follows:
```
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py \
  --model deit_small_patch16_224 \
  --teacher_model deit_base \
  --epochs 50 \
  --batch-size 128 \
  --data-path /Dataset_classification/imagenet/ \
  --distillation-type soft \
  --distillation-alpha 0.5 \
  --distillation-tau 1 \
  --input-size 224 \
  --maskedkd --len_num_keep 98 \
  --accum-steps 4 \
  --output_dir /MaskedKD/work_dir/2_node_accumul4/
````
log.txt is as follows:
[log.txt](https://github.com/user-attachments/files/21879623/log.txt)

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experimental environment questions for each model #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Experimental environment questions for each model #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions