You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A test DDP script can be launched with torchX with:
220
+
221
+
```sh
222
+
torchx run
223
+
```
224
+
225
+
See [.torchxconfig](.torchxconfig), [torchx.py](./torchft/torchx.py) and the [torchX documentation](https://pytorch.org/torchx/latest/) to understand how DDP is being ran.
226
+
227
+
`torchx.py` could also launch HSDP jobs when `workers_per_replica` is set > 1, if the training script supports it. For an example HSDP training implementation with torchFT enabled, see [torchtitan](https://github.com/pytorch/torchtitan).
228
+
229
+
Alternatively, to test on a node with two GPUs, you can launch two replica groups running [train_ddp.py](./train_ddp.py) by:
230
+
231
+
On shell 1 (one replica groups starts initial training):
0 commit comments