[feat] add trainer factory #489

geniuspatrick · 2023-03-07T12:34:47Z

Thank you for your contribution to the MindCV repo.
Before submitting this PR, please make sure:

You have read the Contributing Guidelines on pull requests
Your code builds clean without any errors or warnings
You are using approved terminology
You have added unit tests

Motivation

We abstracted the fragment of the training script about creating a mindspore.Model (actually a Trainer) into the function create_trainer, a factory method of creating trainers. We believe this abstraction facilitates code readability.

When creating the trainer we need to pass in common components such as networks, optimizer, loss functions etc. as input parameters. Meanwhile, we also need to pass in additional parameters to support Auto Mixed Precision(AMP). Next, we elaborate on the creator's design and principles regarding AMP in details.

The level of amp:
We follow the definition from MindSpore
We may subsequently consider customized black and white lists
The type of loss scale:
- fixed(w/ or w/o drop_overflow_update)
- dynamic
- auto
For fixed or dynamic, we will explicitly construct the LossScaleManager and pass in mindspore.Model. For auto, we will not actively construct the LossScaleManager, but mindspore.Model may use the LossScaleManager silently, see mindspore.train.amp for details.
The value of loss scale
We will raise an error if the value of loss scale is less than 1. We no longer make a special case for loss_scale=1, although for a while we did. You can now set the type of loss scale to auto to achieve the same effect.

We have also considered support for customized TrainStep. The current customized TrainStep supports EMA and gradient clipping. Note: the current customized TrainStep can only be used with fixed loss scale without dropping overflow!

Test Plan

st is already in tests/tasks/test_train_val_imagenet_subset.py. Do we need an additional unit test?

Related Issues and PRs

Nope

mindcv/engine/trainer_factory.py

Songyuanwei · 2023-03-08T01:53:22Z

mindcv/engine/trainer_factory.py

+                loss_scale_value=loss_scale, scale_factor=2, scale_window=2000
+            )
+        else:
+            raise ValueError(f"Loss scale type only support ['fixed', 'dynamic'], but got{loss_scale_type}.")


这里的loss scale type和上面的不一样，不支持auto，用户会产生歧义吗？

同样需要自定义的TrainStep支持auto

Songyuanwei · 2023-03-08T02:02:28Z

mindcv/engine/trainer_factory.py

+            #  instead of cell, and TrainStep should be TrainOneStepCell. If drop_overflow_update is True,
+            #  scale_sense should be FixedLossScaleUpdateCell, and TrainStep should be TrainOneStepWithLossScaleCell.
+            train_step_kwargs["scale_sense"] = nn.FixedLossScaleUpdateCell(loss_scale_value=loss_scale)
+        elif loss_scale_type.lower() == "dynamic":


train_step中目前仅有不检测溢出的情况，如果设置成dynamic loss scale， train_step中应该要检测溢出，但是目前中代码没有检测溢出。

后面加了loss scale的限制

geniuspatrick requested review from Songyuanwei and SamitHuang March 7, 2023 12:35

SamitHuang reviewed Mar 7, 2023

View reviewed changes

mindcv/engine/trainer_factory.py Show resolved Hide resolved

Songyuanwei reviewed Mar 8, 2023

View reviewed changes

geniuspatrick force-pushed the main branch from 3dfd614 to 459d474 Compare March 8, 2023 02:09

Songyuanwei approved these changes Mar 8, 2023

View reviewed changes

[feat] add trainer factory

a4b8801

geniuspatrick force-pushed the main branch from 459d474 to a4b8801 Compare March 8, 2023 06:49

geniuspatrick merged commit c05a800 into mindspore-lab:main Mar 8, 2023

Songyuanwei mentioned this pull request Mar 9, 2023

move callbacks train_step and trainer_factory to utils #502

Merged

4 tasks

geniuspatrick mentioned this pull request Jun 8, 2023

feat: support DynamicLossScale for TrainStep #678

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] add trainer factory #489

[feat] add trainer factory #489

geniuspatrick commented Mar 7, 2023 •

edited

Loading

Songyuanwei Mar 8, 2023 •

edited

Loading

geniuspatrick Mar 8, 2023

Songyuanwei Mar 8, 2023

geniuspatrick Mar 8, 2023

[feat] add trainer factory #489

[feat] add trainer factory #489

Conversation

geniuspatrick commented Mar 7, 2023 • edited Loading

Motivation

Test Plan

Related Issues and PRs

Songyuanwei Mar 8, 2023 • edited Loading

Choose a reason for hiding this comment

geniuspatrick Mar 8, 2023

Choose a reason for hiding this comment

Songyuanwei Mar 8, 2023

Choose a reason for hiding this comment

geniuspatrick Mar 8, 2023

Choose a reason for hiding this comment

geniuspatrick commented Mar 7, 2023 •

edited

Loading

Songyuanwei Mar 8, 2023 •

edited

Loading