Lightning creates two DeepSpeedEngine instances for the same model

### Bug description

Hello Lightning team!

We got serval user reports (e.g. https://github.com/microsoft/DeepSpeed/issues/3068) about errors when using Lighning with DeepSpeed. The issue is that Lightning creates two DeepSpeedEngine instances for the same model at https://github.com/Lightning-AI/lightning/blob/6ec9a6bd9e792f505ebc931742d4235f311eb289/src/lightning/pytorch/strategies/deepspeed.py#L447-L450
Yet neither of the DeepSpeedEngine is aware of the existence of the other. So when it comes to zero3 optimization, these two DeepSpeedEngines are going to operate on the same set of parameters on their own management, which leads to the crash.
We tried to tackle this issue from our end by bounding the parameters management to the model so they can be shared among DeepSpeedEngine instances, yet we realize the Lightning creates different wrapper instances for the model before passing it to DeepSpeed so from the DeepSpeed end it looks like different models. 
DeepSpeed can do both training and validation on the same DeepSpeedEngine instance. Thus we want to reach out to understand more about the intuition behind using multiple DeepSpeedEngines (or wrappers) and also to check if there is anything we can do on our end to make the same DeepSpeedEngine usable for both the training and validation in your use case.

### What version are you seeing the problem on?

master

### How to reproduce the bug

```python
There is a pretty nice reproduction script from the user https://github.com/microsoft/DeepSpeed/issues/3068#issuecomment-1486539136
```


### Error messages and logs

_No response_

### Environment

_No response_

### More info

_No response_

cc @awaelchli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lightning creates two DeepSpeedEngine instances for the same model #17523

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Lightning creates two DeepSpeedEngine instances for the same model #17523

Description

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions