-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] add trainer factory #489
Conversation
loss_scale_value=loss_scale, scale_factor=2, scale_window=2000 | ||
) | ||
else: | ||
raise ValueError(f"Loss scale type only support ['fixed', 'dynamic'], but got{loss_scale_type}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的loss scale type和上面的不一样,不支持auto,用户会产生歧义吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同样需要自定义的TrainStep支持auto
# instead of cell, and TrainStep should be TrainOneStepCell. If drop_overflow_update is True, | ||
# scale_sense should be FixedLossScaleUpdateCell, and TrainStep should be TrainOneStepWithLossScaleCell. | ||
train_step_kwargs["scale_sense"] = nn.FixedLossScaleUpdateCell(loss_scale_value=loss_scale) | ||
elif loss_scale_type.lower() == "dynamic": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train_step中目前仅有不检测溢出的情况,如果设置成dynamic loss scale, train_step中应该要检测溢出,但是目前中代码没有检测溢出。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面加了loss scale的限制
Thank you for your contribution to the MindCV repo.
Before submitting this PR, please make sure:
Motivation
We abstracted the fragment of the training script about creating a
mindspore.Model
(actually a Trainer) into the functioncreate_trainer
, a factory method of creating trainers. We believe this abstraction facilitates code readability.When creating the trainer we need to pass in common components such as networks, optimizer, loss functions etc. as input parameters. Meanwhile, we also need to pass in additional parameters to support Auto Mixed Precision(AMP). Next, we elaborate on the creator's design and principles regarding AMP in details.
The level of amp:
We follow the definition from MindSpore
We may subsequently consider customized black and white lists
The type of loss scale:
For
fixed
ordynamic
, we will explicitly construct theLossScaleManager
and pass inmindspore.Model
. Forauto
, we will not actively construct theLossScaleManager
, butmindspore.Model
may use theLossScaleManager
silently, see mindspore.train.amp for details.The value of loss scale
We will raise an error if the value of loss scale is less than 1. We no longer make a special case for
loss_scale=1
, although for a while we did. You can now set the type of loss scale toauto
to achieve the same effect.We have also considered support for customized
TrainStep
. The current customizedTrainStep
supports EMA and gradient clipping. Note: the current customizedTrainStep
can only be used with fixed loss scale without dropping overflow!Test Plan
st is already in
tests/tasks/test_train_val_imagenet_subset.py
. Do we need an additional unit test?Related Issues and PRs
Nope