Skip to content

GPU 0 context created on GPU 1 worker when using pin_memory=True #58626

@msbaines

Description

@msbaines

🐛 Bug

While trying to enable pin_memory=True on fairseq (facebookresearch/fairseq#3560) I noticed that a GPU 0 context was being created on the GPU 1 worker. I eventually root caused this to the following code in dataloader.py:

torch.cuda.current_device(),

The problem is that fairseq is creating a worker thread which as a side-effect of iterating creates the pin_memory thread.

https://github.com/pytorch/fairseq/blob/d6855baec88f99ac776962027b91d404fe917eea/fairseq/data/iterators.py#L548

The worker thread gets the default device of GPU0. I worked around by calling torch.cuda.set_device(). This was a hard bug to track down. Not sure how to avoid this. Perhaps torch could add a threading wrapper similar to torch.multiprocessing that would ensure default device is consistent across threads.

Note: The GPU 1 worker immediately calls torch.set_device() after process creation. The problem is that sub-threads do not inherit the context of the main thread.

To Reproduce

Steps to reproduce the behavior:

Expected behavior

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

cc @ngimel @ssnl @VitalyFedyunin @ejguan

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cudaRelated to torch.cuda, and CUDA support in generalmodule: dataloaderRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions