Skip to content

Comments

Lazily initialize CUDA devices (take 2)#613

Merged
soumith merged 1 commit intotorch:masterfrom
colesbury:lazy
Nov 26, 2016
Merged

Lazily initialize CUDA devices (take 2)#613
soumith merged 1 commit intotorch:masterfrom
colesbury:lazy

Conversation

@colesbury
Copy link
Contributor

Previously, cutorch would initialize every CUDA device and enable P2P
access between all pairs. This slows down start-up, especially with 8
devices. Now, THCudaInit does not initialize any devices and P2P access
is enabled lazily. Setting the random number generator seed also does
not initialize the device until random numbers are actually used.

I've updated the Storage copy code to delegate the Tensor copy code. This
fixes the issues with p2p not being enabled and adds proper inter-GPU
synchronization (see #612)

Previously, cutorch would initialize every CUDA device and enable P2P
access between all pairs. This slows down start-up, especially with 8
devices. Now, THCudaInit does not initialize any devices and P2P access
is enabled lazily. Setting the random number generator seed also does
not initialize the device until random numbers are actually used.
@soumith soumith merged commit e2051b6 into torch:master Nov 26, 2016
@soumith
Copy link
Member

soumith commented Nov 26, 2016

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants