-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
In PyTorch dataloader (cpu sampling), worker processes will never initialize CUDA context, as CUDA runtime does not support fork start method (https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing). I think DGL dataloader should also follow this convention, if possible.
However, here, when creating coo/csr matrix, it would actually call this cudaPointerGetAttributes CUDA runtime API inside IsPinned function. As a result, each worker would call this constructor and start init/access cuda instances.
Lines 68 to 71 in fedaa36
| is_pinned = (aten::IsNullArray(row) || row.IsPinned()) && | |
| (aten::IsNullArray(col) || col.IsPinned()) && | |
| (aten::IsNullArray(data) || data.IsPinned()); | |
| } |
It's not a bug and will not error out as such behavior is guarded here by clearing the cuda error msg (see below):
dgl/src/runtime/cuda/cuda_device_api.cc
Lines 295 to 301 in b35757a
| // We don't want to fail in these particular cases since this function | |
| // can be called when users only want to run on CPU even if CUDA API is | |
| // enabled, or in a forked subprocess where CUDA context cannot be | |
| // initialized. So we just mark the CUDA context to unavailable and | |
| // return. | |
| is_available_ = false; | |
| cudaGetLastError(); // clear error |
Nevertheless, I believe it would still be preferable to adhere to PyT's convention by removing the IsPinned function from the constructor of the coo/csr matrix.
I can come up with a PR for the fix later. cc. @nv-dlasalle @yaox12 @frozenbugs @TristonC