Skip to content

Commit

Permalink
instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
josephdviviano committed Jun 18, 2024
1 parent 4b93507 commit e2a29c0
Showing 1 changed file with 67 additions and 0 deletions.
67 changes: 67 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,73 @@ cd torchgfn
pip install .
```

## Installing `oneccl` bindings for multinode training.

You can determine the version of `pytorch` installed using the command

```
echo $(python -c $"import torch; print(torch.__version__)")
```

after which you can install the closest matching version from [this table](https://github.com/intel/torch-ccl?tab=readme-ov-file#install-prebuilt-wheel) (otherwise, you must build from source). You can see the specific wheels [here](https://pytorch-extension.intel.com/release-whl/stable/cpu/us/oneccl-bind-pt/).

```
pip install oneccl_bind_pt=={pytorch_version} -f https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
```

for example, if your pytorch version is `2.0.1+cu117`, you would run `python -m pip install oneccl_bind_pt==2.0.0+cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/`.


***TODO: Rough instructions - to integrate into docs (just moving them here from email) -
```
Create the conda env
conda create -n gfn python=3.10
Activate it
conda activate gfn
Install the package
pip install .
pip install tqdm # tqdm is not installed by default
We will use torch-ccl library for multinode implementation. The latest torch-ccl is compatible with PyTorch 2.2.0. The above command installs the latest torch. So, we need to uninstall it and install latest torch. If you agree that we can make it the default version, I can update it in pyproject.toml.
Uninstall latest torch
pip uninstall torch
Install torch version 2.2.0, CPU only
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 cpuonly -c pytorch
Install torch-ccl
git clone https://github.com/intel/torch-ccl.git torch-ccl && cd torch-ccl
git checkout tags/v2.2.0+cpu
git submodule sync
git submodule update --init –recursive
ONECCL_BINDINGS_FOR_PYTORCH_BACKEND=cpu python setup.py install
Installation is complete now.
You can submit a job by modifying one of the slurm scripts and submitting. For example, ddp_gfn.small.8.slurm. Please note that you need to modify the conda env name in the slurm script to the name of your env. Also, change the paths and dimensions if needed. I submit the script using the following command:
sbatch ddp_gfn.small.4.mila.slurm
```


## About this repo

Expand Down

0 comments on commit e2a29c0

Please sign in to comment.