File tree Expand file tree Collapse file tree 1 file changed +11
-1
lines changed Expand file tree Collapse file tree 1 file changed +11
-1
lines changed Original file line number Diff line number Diff line change @@ -27,7 +27,15 @@ Requires: Linux. If using multiple machines: SSH & shared filesystem.
27
27
28
28
<h4 >Example: simple training loop</h4 >
29
29
30
- Suppose we have some distributed training function (which needs to run on every GPU):
30
+ Suppose we have some distributed training function (needs to run on every GPU):
31
+
32
+ ``` python
33
+ def distributed_training (output_dir : str , num_steps : int = 10 ) -> str :
34
+ # returns path to model checkpoint
35
+ ```
36
+
37
+ <details >
38
+ <summary ><b >Click to expand (implementation)</b ></summary >
31
39
32
40
``` python
33
41
from __future__ import annotations
@@ -63,6 +71,8 @@ def distributed_training(output_dir: str, num_steps: int = 10) -> str | None:
63
71
return None
64
72
```
65
73
74
+ </details >
75
+
66
76
We can distribute and run this function (e.g. on 2 machines x 2 GPUs) using ** ` torchrunx ` ** !
67
77
68
78
``` python
You can’t perform that action at this time.
0 commit comments