-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Bug
While trying to enable pin_memory=True on fairseq (facebookresearch/fairseq#3560) I noticed that a GPU 0 context was being created on the GPU 1 worker. I eventually root caused this to the following code in dataloader.py:
pytorch/torch/utils/data/dataloader.py
Line 930 in 8cf85a1
| torch.cuda.current_device(), |
The problem is that fairseq is creating a worker thread which as a side-effect of iterating creates the pin_memory thread.
The worker thread gets the default device of GPU0. I worked around by calling torch.cuda.set_device(). This was a hard bug to track down. Not sure how to avoid this. Perhaps torch could add a threading wrapper similar to torch.multiprocessing that would ensure default device is consistent across threads.
Note: The GPU 1 worker immediately calls torch.set_device() after process creation. The problem is that sub-threads do not inherit the context of the main thread.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Environment
Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).
You can get the script and run it with:
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
- PyTorch Version (e.g., 1.0):
- OS (e.g., Linux):
- How you installed PyTorch (
conda,pip, source): - Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information: