-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Description
When we call an ATen type.randn({n}); function from multiple threads simultaneously, there's a chance it errors with
THCudaCheck FAIL file=/private/home/kaia/FAIR_rush/autogradpp/pytorch/aten/src/THC/generic/THCTensorMathPairwise.cu line=81 error=77 : an illegal memory access was encountered terminate called after throwing an instance of 'std::runtime_error' what(): Creating MTGP kernel state failed. at /private/home/kaia/FAIR_rush/autogradpp/pytorch/aten/src/THC/THCTensorRandom.cu:38
This problem disappears if you try generate a random number on each device before you launch any threads.
I suspect it's because this function is not protected by a mutex:
Metadata
Metadata
Assignees
Labels
No labels