-
Notifications
You must be signed in to change notification settings - Fork 376
Description
Congratulations on the new release! Could you maybe provide official recommendations for how to train on multiple GPUs, possibly alongside a full example? The examples in the repo fail due to unset environment variables and I am not sure which integration (Data Parallel, Distributed Data Parallel etc.) to use. The official PyTorch documentation is very thorough but not exactly intuitive for someone just wanting to run a model quickly. My use-case is that I would like to train a large dataset on a single machine with two or more GPUs with opacus. This, I believe, is what most users would like to do so I believe an end-to-end tutorial would be very useful.
I also tried Lightning, but this fails when setting >1 GPU. Is this something which is anticipated or am I doing something wrong. Your official example doesn’t include multi-GPU support AFAIK.
Thank you very much!