-
Notifications
You must be signed in to change notification settings - Fork 295
Commit
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,8 +42,50 @@ Parallelization with multiple Python threads is possible because all computation | |
``` | ||
|
||
## Model and tensor parallelism | ||
Models as the [`Translator`](python/ctranslate2.Translator.rst) and [`Generator`](python/ctranslate2.Generator.rst) can be split into multiple GPUs different. | ||
This comment has been minimized.
Sorry, something went wrong. |
||
This is very helpful when the model is too big to be load in only 1 GPU. | ||
This comment has been minimized.
Sorry, something went wrong.
panosk
Contributor
|
||
|
||
These types of parallelism are not yet implemented in CTranslate2. | ||
```python | ||
translator = ctranslate2.Translator(model_path, device="cuda", tensor_parallel=True) | ||
``` | ||
|
||
Setup environment: | ||
* Install [open-mpi](https://www.open-mpi.org/) | ||
* Configure open-mpi by creating the config file like ``hostfile``: | ||
```bash | ||
[ipaddress or dns] slots=nbGPU1 | ||
[other ipaddress or dns] slots=NbGPU2 | ||
``` | ||
|
||
Run: | ||
* Run the application in multiprocess to using tensor parallel: | ||
This comment has been minimized.
Sorry, something went wrong. |
||
```bash | ||
mpirun -np nbGPUExpected -hostfile hostfile python3 script | ||
``` | ||
|
||
If you're trying to run the tensor parallelism in multiple machine, there are additional configuration is needed: | ||
This comment has been minimized.
Sorry, something went wrong.
panosk
Contributor
|
||
* Make sure Master and Slave can connect to each other as a pair with ssh + pubkey | ||
* Export all necessary environment variables from Master to Slave like the example below: | ||
```bash | ||
mpirun -x VIRTUAL_ENV_PROMPT -x PATH -x VIRTUAL_ENV -x _ -x LD_LIBRARY_PATH -np nbGPUExpected -hostfile hostfile python3 script | ||
``` | ||
Read more [open-mpi docs](https://www.open-mpi.org/doc/) for more information. | ||
|
||
* In this mode, the application will be run in multiprocess. We can filter out the master process by using: | ||
This comment has been minimized.
Sorry, something went wrong.
panosk
Contributor
|
||
```python | ||
if ctranslate2.MpiInfo.getCurRank() == 0: | ||
print(...) | ||
``` | ||
|
||
```{note} | ||
Running model in tensor parallel mode in one machine can boost the performance but if running the model shared between multiple | ||
machine could be slower because of the latency in the connectivity. | ||
This comment has been minimized.
Sorry, something went wrong. |
||
``` | ||
|
||
```{note} | ||
In mode tensor parallel, `inter_threads` is always supported to run multiple workers. Otherwise, `device_index` no longer has any effect | ||
because tensor parallel mode will check only available gpus on the system and number of gpu that you want to use. | ||
This comment has been minimized.
Sorry, something went wrong.
panosk
Contributor
|
||
``` | ||
|
||
## Asynchronous execution | ||
|
||
|
3 comments
on commit ac8f7ae
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @minhthuc2502 ,
Some minor improvements in wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @panosk for your help. Sorry for my poor english
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Np, thanks for your work on this!
Translator
andGenerator
can be split into multiple GPUs.