Skip to content

Conversation

@colesbury
Copy link
Member

Previously, CUDAGenerator::CUDAGenerator would initialize the random
number generator on the current device. This would usually be device 0.
This is undesirable because initialize the CUDA context allocates a few
100 MBs due to all the kernels in libTHC.so.

This avoids the unecessary call to THCRandom_getGenerator() in the
CUDAGenerator constructor.

Fixes #7320

@apaszke
Copy link
Contributor

apaszke commented May 9, 2018

Can we add a test that checks for this unwanted init? It's not the first time we get this regression.

Previously, CUDAGenerator::CUDAGenerator would initialize the random
number generator on the current device. This would usually be device 0.
This is undesirable because initialize the CUDA context allocates a few
100 MBs due to all the kernels in libTHC.so.

This avoids the unecessary call to THCRandom_getGenerator() in the
CUDAGenerator constructor.

Fixes pytorch#7320
@colesbury
Copy link
Member Author

@apaszke, that's a good idea but I'm not sure what's a good way to do that. The CUDA driver API provides cuDevicePrimaryCtxGetState( CUdevice dev, unsigned int* flags, int* active ). @ngimel is there any equivalent function to check if the primary context is active using the CUDA runtime API?

@ezyang ezyang merged commit 976b1d5 into pytorch:master May 13, 2018
onnxbot added a commit to onnxbot/onnx-fb-universe that referenced this pull request May 13, 2018
@ngimel
Copy link
Collaborator

ngimel commented May 13, 2018

@colesbury unfortunately cuda runtime API ignore the existence of contexts alltogether, so there's no way to query context using them.

weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018
…ytorch#7392)

Previously, CUDAGenerator::CUDAGenerator would initialize the random
number generator on the current device. This would usually be device 0.
This is undesirable because initialize the CUDA context allocates a few
100 MBs due to all the kernels in libTHC.so.

This avoids the unecessary call to THCRandom_getGenerator() in the
CUDAGenerator constructor.

Fixes pytorch#7320

Previously, CUDAGenerator::CUDAGenerator would initialize the random
number generator on the current device. This would usually be device 0.
This is undesirable because initialize the CUDA context allocates a few
100 MBs due to all the kernels in libTHC.so.

This avoids the unecessary call to THCRandom_getGenerator() in the
CUDAGenerator constructor.

Fixes pytorch#7320

* Fix call to get THCState
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants