-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Description
I updated my PyTorch installation this afternoon in order to use LayerNorm, and now code that worked fine before is giving me data loader errors after 2 epochs of training. Training is all on one GPU currently.
Traceback (most recent call last):
File "train.py", line 54, in <module>
main()
File "train.py", line 48, in main
solver.run_wasserstein(num_epochs)
File "/playpen/meder/projects/point-gan/point_wgan_solver.py", line 551, in run_wasserstein
sample = data_iter.next()
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 275, in __next__
idx, batch = self._get_batch()
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 254, in _get_batch
return self.data_queue.get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 378, in get
return recv()
File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 22, in recv
return pickle.loads(buf)
File "/usr/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/reductions.py", line 68, in rebuild_storage_fd
fd = multiprocessing.reduction.rebuild_handle(df)
File "/usr/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle
new_handle = recv_handle(conn)
File "/usr/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle
return _multiprocessing.recvfd(conn.fileno())
OSError: [Errno 4] Interrupted system call
The only difference in my code from before and after the install is the addition of 3 LayerNorm layers. Perhaps something got buggy in the update?
I'm currently checking if the same issue occurs without LayerNorm. Not quite sure why that would be the source of the issue though.
- OS: Ubuntu 16.04
- PyTorch version: GitHub source (as of ~6pm EST)
- How you installed PyTorch (conda, pip, source): source
- Python version: 2.7
- CUDA/cuDNN version: 9.0
- GPU models and configuration: Titan X & 2 Titan Z
- GCC version (if compiling from source): 5.4.0
Metadata
Metadata
Assignees
Labels
No labels