Lazily initialize CUDA devices (take 2) by colesbury · Pull Request #613 · torch/cutorch

colesbury · 2016-11-25T23:53:08Z

Previously, cutorch would initialize every CUDA device and enable P2P
access between all pairs. This slows down start-up, especially with 8
devices. Now, THCudaInit does not initialize any devices and P2P access
is enabled lazily. Setting the random number generator seed also does
not initialize the device until random numbers are actually used.

I've updated the Storage copy code to delegate the Tensor copy code. This
fixes the issues with p2p not being enabled and adds proper inter-GPU
synchronization (see #612)

Previously, cutorch would initialize every CUDA device and enable P2P access between all pairs. This slows down start-up, especially with 8 devices. Now, THCudaInit does not initialize any devices and P2P access is enabled lazily. Setting the random number generator seed also does not initialize the device until random numbers are actually used.

soumith · 2016-11-26T05:32:30Z

thanks!

soumith merged commit e2051b6 into torch:master Nov 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Lazily initialize CUDA devices (take 2)#613

Lazily initialize CUDA devices (take 2)#613
soumith merged 1 commit intotorch:masterfrom
colesbury:lazy

colesbury commented Nov 25, 2016

Uh oh!

soumith commented Nov 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

colesbury commented Nov 25, 2016

Uh oh!

soumith commented Nov 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants