Skip to content

RuntimeError: DataLoader worker (pid 23616) is killed by signal: Terminated. #4507

@Jensen-Su

Description

@Jensen-Su

Here is the error info

Traceback (most recent call last):
  File "train_multi_task.py", line 192, in <module>
    experiment(config)
  File "train_multi_task.py", line 82, in experiment
    data_list.append(iter_.next())
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 267, in __next__
    idx, batch = self._get_batch()
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 246, in _get_batch
    return self.data_queue.get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
  File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 22, in recv
    return pickle.loads(buf)
  File "/usr/lib/python2.7/pickle.py", line 1382, in loads
    return Unpickler(file).load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce
    value = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/reductions.py", line 68, in rebuild_storage_fd
    fd = multiprocessing.reduction.rebuild_handle(df)
  File "/usr/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle
    new_handle = recv_handle(conn)
  File "/usr/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle
    return _multiprocessing.recvfd(conn.fileno())
OSError: [Errno 4] Interrupted system call
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib/python2.7/multiprocessing/util.py", line 325, in _exit_function
Error in sys.exitfunc:
Exception RuntimeError: RuntimeError('DataLoader worker (pid 23604) is killed by signal: Terminated.',) in <bound method DataLoaderIter.__del__ of <torch.utils.data.dataloader.DataLoaderIter object at 0x7f73ba2c8850>> ignored
Exception IOError: IOError(104, 'Connection reset by peer') in <bound method DataLoaderIter.__del__ of <torch.utils.data.dataloader.DataLoaderIter object at 0x7f73ba2c1890>> ignored
Traceback (most recent call last):
  File "/usr/lib/python2.7/atexit.py", line 30, in _run_exitfuncs
    traceback.print_exc()
  File "/usr/lib/python2.7/traceback.py", line 233, in print_exc
    print_exception(etype, value, tb, limit, file)
  File "/usr/lib/python2.7/traceback.py", line 125, in print_exception
    print_tb(tb, limit, file)
  File "/usr/lib/python2.7/traceback.py", line 69, in print_tb
    line = linecache.getline(filename, lineno, f.f_globals)
  File "/usr/lib/python2.7/linecache.py", line 14, in getline
    lines = getlines(filename, module_globals)
  File "/usr/lib/python2.7/linecache.py", line 40, in getlines
    return updatecache(filename, module_globals)
  File "/usr/lib/python2.7/linecache.py", line 133, in updatecache
    lines = fp.readlines()
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 172, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 23616) is killed by signal: Terminated.

I'm lost...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions