Skip to content

[FEATURE] Support repeatability experiments #1939

Closed
@wangat

Description

@wangat

Thank you for your code and engineering. I use the following code to fix the seed, which can be tested to fully reproduce the comparison experiment.(Because I have found that sometimes random replacement of seeds or incomplete fixation can result in a positive or negative 2% evaluation annotation, which is unacceptable in comparative experiments.)

I did some testing experiments and obtain the following results:1. After shutdown and restart, keep the same parameters to completely repeat the previous experiment;2. The same type of graphics card on the same server, single card training or the same number of multi-card training results are the same;3. The seed and the superparameter are the same, but the graphics card is different, and the final result is different;4. Different hardware+same graphics card, resulting in different results;5. The results of continuing training after interrupting the model training are different from those of the model that has been trained all the time (I guess it is related to the epoch and learning rate, sorry that I have not finished studying the relevant code).

I have tested multiple model families, including:resnet、mobilenet、efficientnet、efficientformer、vit、levit、xcit. However, I found that the efficientformerv2_s1 model was not completely fixed, and there were other factors in the code that prevented full reproducibility. When I tested the same graphics card on the same server, I found a slight difference in results during the first epoch; In addition, using the same graphics card for multiple experiments, a gap appeared in the second epoch of testing. I am doing some experiments and searching other articles to find the cause of the problem, but I have not determined it yet. Could you please help me find it?

I modified random.py in utils using the following code.
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
os.environ['PYTHONHASHSEED'] =str(seed)

torch.backends.cudnn.deterministic = True (Using these two will make training slower)
torch.backends.cudnn.benchmark = False

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions