Skip to content

[CPU-sampling] Avoid access/init cuda instances at each sampling (child) process #6561

@chang-l

Description

@chang-l

In PyTorch dataloader (cpu sampling), worker processes will never initialize CUDA context, as CUDA runtime does not support fork start method (https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing). I think DGL dataloader should also follow this convention, if possible.

However, here, when creating coo/csr matrix, it would actually call this cudaPointerGetAttributes CUDA runtime API inside IsPinned function. As a result, each worker would call this constructor and start init/access cuda instances.

is_pinned = (aten::IsNullArray(row) || row.IsPinned()) &&
(aten::IsNullArray(col) || col.IsPinned()) &&
(aten::IsNullArray(data) || data.IsPinned());
}

It's not a bug and will not error out as such behavior is guarded here by clearing the cuda error msg (see below):

// We don't want to fail in these particular cases since this function
// can be called when users only want to run on CPU even if CUDA API is
// enabled, or in a forked subprocess where CUDA context cannot be
// initialized. So we just mark the CUDA context to unavailable and
// return.
is_available_ = false;
cudaGetLastError(); // clear error

Nevertheless, I believe it would still be preferable to adhere to PyT's convention by removing the IsPinned function from the constructor of the coo/csr matrix.

I can come up with a PR for the fix later. cc. @nv-dlasalle @yaox12 @frozenbugs @TristonC

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions