[Feature] Add ApexOptimWrapper #742

xcnick · 2022-11-18T04:03:11Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Add ApexOptimWrapper for mmengine.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
The documentation has been modified accordingly, like docstring or example tutorials.

codecov · 2022-11-18T04:53:22Z

Codecov Report

❗ No coverage uploaded for pull request base (main@6dc1d70). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head 1b5882c differs from pull request most recent head 7a825d0. Consider uploading reports for the commit 7a825d0 to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #742   +/-   ##
=======================================
  Coverage        ?   77.90%           
=======================================
  Files           ?      133           
  Lines           ?    10086           
  Branches        ?     2010           
=======================================
  Hits            ?     7857           
  Misses          ?     1888           
  Partials        ?      341

Flag	Coverage Δ
unittests	`77.90% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

HAOCHENYE

Hi! Thanks for your contribution. It seems the current implementation does not use apex.amp.initialize to prepare the model and optimizer, I know there exits some limitations that make it hard to be implemented In MMEngine, and I want to discuss how we could support ApexOptimWrapper better.

1.If we don't call initialize, will mix-precision training work?

2.As the official document of apex describes:

We need to call amp.initialize() before wrap the model with ddp wrapper and optimizer. However, there is a confliction, PyTorch want suggest we given the parameters of ddp model to optimizer in tutorial

Actually, it is reasonable, since distributed wrapper like FSDP will overwrite the original parameters and we have to pass its parameters to optimizer after wrapping the model. Alright, it means that when we got the optimizer, the model must have been wrapped by DDP, which conflicts with the principle of apex.

In MMEngine, OptimWrapper.optim_context can get the ddp-model and optimizer, I'm not sure if we can use amp.initialize there (Maybe initialize the model in ddp wrapper in place?).

3.We need to consider how to resume the optimizer.

xcnick · 2022-11-18T06:32:55Z

apex.amp.initialize must be called to use mix-precision training.
I think (not sure) apex.amp.initialize is independent of DDP, so the pipeline is:

my_model = Model()
my_opt = SGD(...)
my_model, my_opt =  apex.amp.initialize(my_model, my_opt, opt_level='O1', loss_scale=...)
ddp_model = DDP(my_model)

ApexOptimWrapper use the same load_state_dict() and state_dict() method as OptimWrapper.

HAOCHENYE · 2022-11-20T14:45:57Z

apex.amp.initialize must be called to use mix-precision training.

I think (not sure) apex.amp.initialize is independent of DDP, so the pipeline is:
my_model = Model()
my_opt = SGD(...)
my_model, my_opt =  apex.amp.initialize(my_model, my_opt, opt_level='O1', loss_scale=...)
ddp_model = DDP(my_model)
ApexOptimWrapper use the same load_state_dict() and state_dict() method as OptimWrapper.

Ahhh, If we want to use ApexOptimWrapper independent of Runner, I think the current implementation is almost enough. But if we want to use ApexOptimWrapper in Runner, what should I do?

The key problem is that where should we call apex.amp.initialize to enable mix-precision training. It seems ApexOptimWrapper now does not call apex.amp.initialize. Does it mean that if we simply replace the OptimWrapper with ApexOptimWrapper, the mixed-precision training will not be enabled?

The second problem is the compatibility of ddp-training, as mentioned above:

amp.initialize require that the model should not be wrapper with ddp-wrapper. Therefore, how could we do this in Runner ?

load or resume

In the apex tutorial

apex.amp has their own way t save or load checkpoint, we should take it into consideration.

nijkah · 2022-11-21T02:54:03Z

Hi, I have an interest in this feature.
Related to #627, we have our own implementation to support DeepSpeed in MMEngine and willingness to post the PR.

One of the crucial changes to support DeepSpeed is that deepspeed inlcude deepspeed.initialize(model=model, optimizer=optimizer) which is a very similar interface as apex.amp.initialize.

As mentioned by @HAOCHENYE , these initialize interfaces should be called before Runner.wrap_model and it is not possible with OptimWrapper.optim_context.

So we had to write the new DeepSpeedRunner to support it.
I think it is a quite key feature to several related frameworks such as Colossal-AI.

Can we discuss this? (in this PR, the new Issue or discussion board)

xcnick · 2022-11-21T10:19:34Z

@HAOCHENYE
As you metioned, the key problem is that where should we call apex.amp.initialize to enable mix-precision training.
3.load and resume, also depends on where apex.amp.initialize is called. If ApexOptimWrapper does not call apex.amp.initialize, just keep the existing implementation, inherit from OptimWrapper, amp.state_dict() will be handled somewhere outside, such as ApexRunner.

HAOCHENYE · 2022-11-21T18:51:24Z

@nijkah
Good suggestions!! Recently we are (mainly explored by @C1rN09) also committed to figuring out a proper design to support training with DeepSpeed or Colossal AI. I think it could be a good idea to create a new discussion to discuss this topic Specially
😆 ！

@xcnick
Completely agree! But could we call apex.amp.initialize in some tricky way? Actually, _initialize in apex

https://github.com/NVIDIA/apex/blob/082f999a6e18a3d02306e27482cc7486dab71a50/apex/amp/_initialize.py#L145

only patch the forward and register some custom hooks into forward, I'm not sure it will work when we call amp.initialize in optim_context like:

# suppose in ddp training
amp.initialize(model.module, self.optimizer)

If it works, I think it could be a temporary solution to support ApexOptimWrapper. When we find a more elegant way to support Colossal AI or DeepSpeed, which also applies to ApexOptimWrapper, we could switch to the new design.

C1rN09 · 2022-11-22T04:33:50Z

Can we discuss this? (in this PR, the new Issue or discussion board)

Hi, @nijkah ! I've created a discussion thread. You can paste these comments there and we can discuss on it!

And @xcnick since this PR is related, your opinions and ideas are also welcome!

nijkah · 2022-11-23T04:34:13Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+    def __init__(self, opt_level='O1', loss_scale='dynamic', **kwargs):
+        super().__init__(**kwargs)


Suggested change

def __init__(self, opt_level='O1', loss_scale='dynamic', **kwargs):

super().__init__(**kwargs)

def __init__(self, opt_level='O1', loss_scale='dynamic', **kwargs):

assert apex_amp is not None, \

'Apex is not installed. Please check https://github.com/NVIDIA/apex#linux.',

super().__init__(**kwargs)

The assertion logic could be added here.

nijkah · 2022-11-23T04:34:27Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+except ImportError:
+    pass


Suggested change

except ImportError:

pass

except ImportError:

apex_amp = None

xcnick · 2022-11-23T08:51:20Z

@nijkah Thanks for your suggestion.

nijkah · 2022-11-24T01:33:44Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+            if isinstance(model, torch.nn.parallel.DistributedDataParallel):
+                model = model.module
+            with super().optim_context(model):
+                model, self.optimizer = apex_amp.initialize(
+                    model,


This is just a question. When I checked the Apex documentation, I couldn't find a description to handle it like this.
Isn't there any issue in training when handling it like this?

This is actually what I concerned, I took a cursory look at some of implementation for apex.initialize:

https://github.com/NVIDIA/apex/blob/082f999a6e18a3d02306e27482cc7486dab71a50/apex/amp/_initialize.py#L145

It seems initialize only patches the forward and registers some custom hooks into forward. I'm not sure about doing this on model.module will work or not, we need to check the optimization for saving memory and accelerating training when ApexOptimWrapper is enabled.

@xcnick could you provide the comparison of nvidia-smi when ApexOptimWrapper is enabled or not?

OK, there is something wrong with current implementation.

In examples\examples/distributed_training.py, compare the following two options:

batch_size=1024, ... optim_wrapper=dict( type='ApexOptimWrapper', opt_level='O1', loss_scale=10, optimizer=dict(type=Adam, lr=0.001)), ...

and

batch_size=1024, ... optim_wrapper=dict(optimizer=dict(type=SGD, lr=0.001, momentum=0.9)),

result:

optim_wrapper nvidia-smi memory in log file

ApexOptimWrapper 11GB 5574

original OptimWrapper 5.2GB 2249

It seems that calling amp.initialize in optim_context may not be the correct way.

You may be able to participate in the related discussion if you are interested.
#749 (comment)

Emmm, the conclusion makes me confused. I've tried to train mmdet with retinanet-r50 based on this PR for figuring out why it does not work. However, the result of nvidia-smi is:

It seems that the ApexOptimWrapper has worked.... I also test the examples/examples/distributed_training.py, the result is the same.

So... what happened 🤣

I'll use ApexOptimWrapper based on this PR to train MMDet these days, and check the training speed and accuracy. Do you have any suggestions or comments about the current implementations @xcnick @nijkah?

I'm not sure whether the this implementation can work correctly, so good luck ^^

In addition, the to method of BaseModel in base_model.py needs to be modified for compatibility in O0 and O3 modes.

In addition, the to method of BaseModel in base_model.py needs to be modified for compatibility in O0 and O3 modes.

Do you mean the modifications in #783 ? If it is not enough, feel free to leave a comment!

Sorry for my late reply. To give a conclusion first: the current implementation seems OK. I used ApexOptimWrapper to train the atss in MMDet, and the loss was able to converge normally, and it will take some time to verify the accuracy.

The reason for the late reply is that I spent some time trying to train RetinaNet (O1) based on ApexOptimWrapper. However, it turned out that neither AmpOptimWrapper nor ApexOptimWrapper can train RetinaNet (loss is nan) normally, which is determined by the model's own characteristics and has nothing to do with the implementation of ApexOptimWrapper

CLAassistant · 2022-12-14T03:22:10Z

All committers have signed the CLA.

HAOCHENYE

Hi~ is there any progress? Besides, CLA should be signed again 🤣 .

HAOCHENYE · 2023-01-06T07:21:56Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+        if hasattr(self.optimizer, '_amp_stash'):
+            yield


Suggested change

if hasattr(self.optimizer, '_amp_stash'):

yield

if hasattr(self.optimizer, '_amp_stash'):

with super().optim_context(model):

yield

Does it mean that the initialized optimizer will have the _amp_stash attribute? Maybe we need to add some comments here.

HAOCHENYE · 2023-01-06T07:23:39Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+        ``ApexOptimWrapper`` requires
+            [nvidia apex](https://github.com/NVIDIA/apex).


Suggested change

``ApexOptimWrapper`` requires

[nvidia apex](https://github.com/NVIDIA/apex).

``ApexOptimWrapper`` requires `nvidia apex <https://github.com/NVIDIA/apex>`_

HAOCHENYE · 2023-01-06T07:24:34Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+    Args:
+
+        **kwargs: Keyword arguments passed to OptimWrapper.


Missing arguments description here

HAOCHENYE · 2023-01-06T07:25:17Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+        ``accumulative_counts``.
+    """
+
+    def __init__(self, opt_level='O1', loss_scale='dynamic', **kwargs):


missing type hint

HAOCHENYE · 2023-01-06T07:25:39Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+        self.opt_level = opt_level
+        self.loss_scale = loss_scale
+
+    def backward(self, loss: torch.Tensor, **kwargs):


Suggested change

def backward(self, loss: torch.Tensor, **kwargs):

def backward(self, loss: torch.Tensor, **kwargs) -> None:

HAOCHENYE · 2023-01-06T07:25:56Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+        state_dict['apex_amp'] = apex_amp.state_dict()
+        return state_dict
+
+    def load_state_dict(self, state_dict: dict):


Suggested change

def load_state_dict(self, state_dict: dict):

def load_state_dict(self, state_dict: dict) -> None:

HAOCHENYE · 2023-01-06T07:30:23Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+        else:
+            if isinstance(model, torch.nn.parallel.DistributedDataParallel):
+                model = model.module
+            with super().optim_context(model):


super().optim_context should be called each iteration for avoiding necessary gradient accumulation during training,

xcnick · 2023-01-09T09:39:50Z

Hi~ is there any progress? Besides, CLA should be signed again 🤣 .

Sorry for the late response, I updated the code.

mmengine/optim/optimizer/apex_optimizer_wrapper.py

zhouzaida · 2023-01-28T15:01:52Z

Please add ApexOptimWrapper in https://github.com/open-mmlab/mmengine/blob/main/docs/en/api/optim.rst#optimizer and https://github.com/open-mmlab/mmengine/blob/main/docs/zh_cn/api/optim.rst#optimizer

HAOCHENYE · 2023-01-13T05:02:31Z

mmengine/optim/optimizer/apex_optimizer_wrapper.py

+            # when a given optimizer be passed through apex_amp.initialize,
+            # the "_amp_stash" property will be added
+            if hasattr(self.optimizer, '_amp_stash'):
+                yield


It seems that here misses a return, otherwise apex_amp.initialize will be called each iteration.

It's weird that yield is followed by return, but it make sense in contextmanager.

We can simplify the logic.

if not hasattr(self.optimizer, '_amp_stash'): if isinstance(model, torch.nn.parallel.DistributedDataParallel): model = model.module model, self.optimizer = apex_amp.initialize(xxx) yield

HAOCHENYE · 2023-01-13T05:05:34Z

tests/test_optim/test_optimizer/test_optimizer_wrapper.py

+        apex_optim_wrapper = ApexOptimWrapper(
+            optimizer=optimizer, opt_level='O1', loss_scale=1)
+        with apex_optim_wrapper.optim_context(self.model):
+            apex_optim_wrapper.optimizer.param_groups = MagicMock()


Sometimes we use MagicMock to assert some function or method has been called, but why do we need mock the param_groups here?

mmengine/optim/optimizer/apex_optimizer_wrapper.py

tests/test_optim/test_optimizer/test_optimizer_wrapper.py

zhouzaida · 2023-02-02T16:29:02Z

Hi @xcnick , thanks for your contribution. This PR can be merged after resolving the above final comments and validating it in your local machine.

HAOCHENYE · 2023-02-03T07:35:59Z

Hi! as the official document says:

The current implementation of ApexOptimWrapper.load_state_dict will raise an error for the lack of initialization of apex. A possibly workaround could be not loading state_dict in ApexOptimWrapper.load_state_dict, but only saving the state_dict as an attribute, then calling apex.amp.load_state_dict in optim_context

mmengine/optim/optimizer/apex_optimizer_wrapper.py

xcnick requested review from zhouzaida and HAOCHENYE as code owners November 18, 2022 04:03

mm-assistant bot assigned HAOCHENYE Nov 18, 2022

HAOCHENYE reviewed Nov 18, 2022

View reviewed changes

HAOCHENYE added the planned feature label Nov 20, 2022

HAOCHENYE added this to the 0.5.0 milestone Nov 20, 2022

nijkah reviewed Nov 23, 2022

View reviewed changes

nijkah reviewed Nov 24, 2022

View reviewed changes

HAOCHENYE mentioned this pull request Nov 25, 2022

[Attention] 超级视客营 MMEngine 🚀🚀🚀 #732

Closed

HAOCHENYE reviewed Jan 6, 2023

View reviewed changes

xcnick force-pushed the apex-optim branch from 6289f76 to 9683084 Compare January 9, 2023 09:35

HAOCHENYE modified the milestones: 0.5.0, 0.6.0 Jan 13, 2023

HAOCHENYE mentioned this pull request Jan 14, 2023

Calling for volunteers for developing cool features! 🚀 #731

Closed

zhouzaida reviewed Jan 28, 2023

View reviewed changes

mmengine/optim/optimizer/apex_optimizer_wrapper.py Outdated Show resolved Hide resolved

zhouzaida reviewed Jan 28, 2023

View reviewed changes

mmengine/optim/optimizer/apex_optimizer_wrapper.py Outdated Show resolved Hide resolved

xcnick requested review from Harold-lkk and RangiLyu as code owners January 28, 2023 07:26

xcnick force-pushed the apex-optim branch from 99be310 to 41d4370 Compare January 28, 2023 07:29

HAOCHENYE reviewed Jan 30, 2023

View reviewed changes

xcnick added 8 commits February 2, 2023 14:56

add ApexOptimWrapper

913e7d2

typo fix

01c7b41

add apex amp.initialize in optim_context

cdf612e

assert apex_amp

b313290

polish code

6f7a25f

add parameters of apex_amp.initialize

bdf660f

add docs

2b8ea95

polish code

fe6247f

xcnick force-pushed the apex-optim branch from 7207e2e to fe6247f Compare February 2, 2023 15:06

zhouzaida reviewed Feb 2, 2023

View reviewed changes

mmengine/optim/optimizer/apex_optimizer_wrapper.py Outdated Show resolved Hide resolved

zhouzaida reviewed Feb 2, 2023

View reviewed changes

tests/test_optim/test_optimizer/test_optimizer_wrapper.py Show resolved Hide resolved

xcnick added 2 commits February 2, 2023 16:32

polish code

f87ed69

polish code

3a76471

zhouzaida previously approved these changes Feb 3, 2023

View reviewed changes

fix calling of apex amp load_state_dict

5deeef1

xcnick dismissed zhouzaida’s stale review via 5deeef1 February 3, 2023 09:11

HAOCHENYE reviewed Feb 3, 2023

View reviewed changes

mmengine/optim/optimizer/apex_optimizer_wrapper.py Outdated Show resolved Hide resolved

polish

9689f63

zhouzaida reviewed Feb 4, 2023

View reviewed changes

mmengine/optim/optimizer/apex_optimizer_wrapper.py Show resolved Hide resolved

add comments

1b5882c

zhouzaida previously approved these changes Feb 5, 2023

View reviewed changes

HAOCHENYE previously approved these changes Feb 6, 2023

View reviewed changes

Update apex_optimizer_wrapper.py

7a825d0

zhouzaida dismissed stale reviews from HAOCHENYE and themself via 7a825d0 February 6, 2023 07:11

Update apex_optimizer_wrapper.py

955b5d6

zhouzaida merged commit e35ed5f into open-mmlab:main Feb 6, 2023

		def __init__(self, opt_level='O1', loss_scale='dynamic', **kwargs):
		super().__init__(**kwargs)

		``ApexOptimWrapper`` requires
		[nvidia apex](https://github.com/NVIDIA/apex).

	``ApexOptimWrapper`` requires
	[nvidia apex](https://github.com/NVIDIA/apex).
	``ApexOptimWrapper`` requires `nvidia apex <https://github.com/NVIDIA/apex>`_

	def backward(self, loss: torch.Tensor, **kwargs):
	def backward(self, loss: torch.Tensor, **kwargs) -> None:

	def load_state_dict(self, state_dict: dict):
	def load_state_dict(self, state_dict: dict) -> None:

[Feature] Add ApexOptimWrapper #742

[Feature] Add ApexOptimWrapper #742

Uh oh!

Conversation

xcnick commented Nov 18, 2022

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

codecov bot commented Nov 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

HAOCHENYE left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xcnick commented Nov 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HAOCHENYE commented Nov 20, 2022

Uh oh!

nijkah commented Nov 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xcnick commented Nov 21, 2022

Uh oh!

HAOCHENYE commented Nov 21, 2022

Uh oh!

C1rN09 commented Nov 22, 2022

Uh oh!

nijkah Nov 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xcnick commented Nov 23, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HAOCHENYE Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HAOCHENYE Dec 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HAOCHENYE left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

codecov bot commented Nov 18, 2022 •

edited

Loading

HAOCHENYE left a comment •

edited

Loading

xcnick commented Nov 18, 2022 •

edited

Loading

nijkah commented Nov 21, 2022 •

edited

Loading

nijkah Nov 23, 2022 •

edited

Loading

HAOCHENYE Nov 29, 2022 •

edited

Loading

HAOCHENYE Dec 15, 2022 •

edited

Loading

CLAassistant commented Dec 14, 2022 •

edited

Loading