-
Notifications
You must be signed in to change notification settings - Fork 42
Multi-GPU Support #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-GPU Support #463
Conversation
|
this ready to be tested? |
|
@Disservin it works correctly (verified by Viren on 1,2,3,4x 4090s) but atm it requires a large batch size to be a good speedup because the gradient accumulation code is basic and suboptimal. |
|
so the interconnect is limiting too much for it to be worth much ? |
|
@Disservin yes, but if you increase the batch size by several times the transfers become infrequent enough that it is a speedup on normal multigpu (transfers are just weight gradients/values so no dependence on batch size ofc). |
|
I will be trying some alternative strategies to reduce the latency of the interconnect. |
No description provided.