Skip to content

Commit c17be1f

Browse files
authored
Update README.md
1 parent c5f082a commit c17be1f

File tree

1 file changed

+31
-18
lines changed

1 file changed

+31
-18
lines changed

README.md

Lines changed: 31 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,52 @@
11
# torchrunx 🔥
22

3-
By [Apoorv Khandelwal](http://apoorvkh.com) and [Peter Curtin](https://github.com/pmcurtin)
4-
53
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/torchrunx)](https://github.com/apoorvkh/torchrunx/blob/main/pyproject.toml)
64
[![PyPI - Version](https://img.shields.io/pypi/v/torchrunx)](https://pypi.org/project/torchrunx/)
75
![Tests](https://img.shields.io/github/actions/workflow/status/apoorvkh/torchrunx/.github%2Fworkflows%2Fmain.yml)
86
[![Docs](https://readthedocs.org/projects/torchrunx/badge/?version=stable)](https://torchrunx.readthedocs.io)
97
[![GitHub License](https://img.shields.io/github/license/apoorvkh/torchrunx)](https://github.com/apoorvkh/torchrunx/blob/main/LICENSE)
108

11-
Automatically launch PyTorch functions onto multiple machines or GPUs
9+
By [Apoorv Khandelwal](http://apoorvkh.com) and [Peter Curtin](https://github.com/pmcurtin)
10+
11+
**Automatically distribute PyTorch functions onto multiple machines or GPUs**
1212

1313
## Installation
1414

1515
```bash
1616
pip install torchrunx
1717
```
1818

19-
Requirements:
20-
- Operating System: Linux
21-
- Python >= 3.8.1
22-
- PyTorch >= 2.0
23-
- Shared filesystem & SSH between hosts
19+
Requires: Linux, Python >= 3.8.1, PyTorch >= 2.0
20+
21+
Shared filesystem & SSH access if using multiple machines
22+
23+
## Why should I use this?
24+
25+
[`torchrun`](https://pytorch.org/docs/stable/elastic/run.html) is a hammer. `torchrunx` is a chisel.
2426

25-
## Features
27+
Whether you have 1 GPU, 8 GPUs, or 8 machines:
2628

27-
- Distribute PyTorch functions to multiple GPUs or machines
28-
- `torchrun` with the convenience of a Python function
29-
- Integration with SLURM
29+
Convenience:
3030

31-
Advantages:
31+
- If you don't want to set up [`dist.init_process_group`](https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group) yourself
32+
- If you want to run `python myscript.py` instead of `torchrun myscript.py`
33+
- If you don't want to manually SSH and run `torchrun --master-ip --master-port ...` on every machine (and if you don't want to babysit these machines for hanging failures)
3234

33-
- Self-cleaning: avoid memory leaks!
34-
- Better for complex workflows
35-
- Doesn't parallelize the whole script: just what you want
36-
- Run distributed functions from Python Notebooks
35+
Robustness:
36+
37+
- If you want to run a complex, _modular_ workflow in one script
38+
- no worries about memory leaks or OS failures
39+
- don't parallelize your entire script: just the functions you want
40+
41+
Features:
42+
43+
- Our launch utility is super _Pythonic_
44+
- If you want to run distributed PyTorch functions from Python Notebooks.
45+
- Automatic integration with SLURM
46+
47+
Why not?
48+
49+
- We don't support fault tolerance via torch elastic. Probably only useful if you are using 1000 GPUs. Maybe someone can make a PR.
3750

3851
## Usage
3952

@@ -99,4 +112,4 @@ accuracy = trx.launch(
99112
)["localhost"][0]
100113

101114
print(f'Accuracy: {accuracy}')
102-
```
115+
```

0 commit comments

Comments
 (0)