-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix multiprocessing and dataloader tests on Windows #4453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test/test_multiprocessing.py
Outdated
| self.assertEqual(storage_size, 5) | ||
|
|
||
| @unittest.skipIf(IS_WINDOWS, 'NYI: not supported on Windows') | ||
| # @unittest.skipIf(IS_WINDOWS, 'NYI: not supported on Windows') |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
I'm not so sure that it works now. @yf225, Do you know when the CI can be put into work? If Windows CI can be setup quickly, then I'd rather have it tested on CI. Otherwise, I'll test it on my own PC. |
|
@peterjc123 Windows CI will be enabled again after this fix #4469. We can test it on CI. |
|
@pytorchbot retest this please |
|
@apaszke It seems that the timeout options in |
|
Let’s multiply the timeout for Windows only |
This reverts commit 27010d4.
|
@pytorchbot test this please |
|
Thank you @peterjc123! |
|
I'm curious. How does Windows build pass the |
|
@ssnl No, it won't. Because there's no fork in Windows. Although you initialized the CUDA context of one tensor, the one in the child process is a brand new one due to spawn, so the segfault won't be triggered. |
|
@peterjc123 I'm still a bit confused. Are you saying that ctype.string(0) doesn't segfault on windows because child processes get new CUDA contexts due to spawn? |
|
@ssnl Sorry, I messup that up with Process Process-1:
Traceback (most recent call last):
File "C:\Anaconda2\envs\test_new\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Anaconda2\envs\test_new\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "D:\pytorch\mp_dl_test.py", line 107, in _test_segfault
_ = next(iter(dataloader))
File "C:\Anaconda2\envs\test_new\lib\site-packages\torch\utils\data\dataloader.py", line 273, in __next__
return self._process_next_batch(batch)
File "C:\Anaconda2\envs\test_new\lib\site-packages\torch\utils\data\dataloader.py", line 293, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
OSError: Traceback (most recent call last):
File "C:\Anaconda2\envs\test_new\lib\site-packages\torch\utils\data\dataloader.py", line 56, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "C:\Anaconda2\envs\test_new\lib\site-packages\torch\utils\data\dataloader.py", line 56, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "D:\pytorch\mp_dl_test.py", line 48, in __getitem__
return ctypes.string_at(0)
File "C:\Anaconda2\envs\test_new\lib\ctypes\__init__.py", line 492, in string_at
return _string_at(ptr, size)
OSError: exception: access violation reading 0x0000000000000000The def _worker_loop(dataset, index_queue, data_queue, collate_fn, seed, init_fn, worker_id):
global _use_shared_memory
_use_shared_memory = True
# Intialize C side signal handlers for SIGBUS and SIGSEGV. Python signal
# module's handlers are executed after Python returns from C low-level
# handlers, likely when the same fatal signal happened again already.
# https://docs.python.org/3/library/signal.html Sec. 18.8.1.1
_set_worker_signal_handlers()
torch.set_num_threads(1)
torch.manual_seed(seed)
if init_fn is not None:
init_fn(worker_id)
while True:
r = index_queue.get()
if r is None:
break
idx, batch_indices = r
try:
samples = collate_fn([dataset[i] for i in batch_indices]) <- throw exception
except Exception:
data_queue.put((idx, ExceptionWrapper(sys.exc_info()))) <- exception returns here
else:
data_queue.put((idx, samples))After the exception was thrown, the |
|
@ssnl However, if the |
|
@peterjc123 I see. Interesting that windows can throw an OSError upon invalid memory access. In posix, we had to do some signal handling to manually capture such issues. :) |
While the other missing components are optional, this one is of much importance. The DataLoader in Windows is not so useful so we have to fix this.