-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fixed non-determinate preprocessing on DataLoader #4640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks! Could you add a test in |
torch/utils/data/dataloader.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Personally I don't think that relying on iterator consumption order is a good idea. One accident |
|
@ssnl Done |
ssnl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. Thanks for the contribution. I think after addressing the comments and resolve conflict, this will be good to go :).
torch/utils/data/dataloader.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/utils/data/dataloader.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/utils/data/dataloader.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/utils/data/dataloader.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/utils/data/dataloader.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
b3fa129 to
7c429f9
Compare
|
@pytorchbot add to whitelist |
|
Sorry I forgot to enable CI on this PR. Should be testing now. |
|
@AlexanderRadionov do you have any benchmarks for the new data loader? Me and @colesbury are worried that it will lower the throughput, and it's important for training on e.g. 8x Volta GPUs. |
|
I made some syntetic tests, just iterate by dataloader. And i have not see any issue. I run it with I think my changes should not cause slowdown because individual worker behavior is not default. |
|
I'm thinking more of a real ImageNet-like workload that actually has to read from disk, and some requests may get stalled for longer than others, slowing down the whole loading process (unlike previously). |
|
I tried to emulate read time difference in my original code posted at head of merge request (random sleep). Here are results:
Do you think is it problem for non-default behavior? @colesbury @apaszke |
|
@apaszke Yes the process will be slowed. But have reproducible results is very important. Do you have suggestions on how to do multiprocess deterministic dataloading in a faster way? |
|
@AlexanderRadionov what was the duration of the sleep you've used? I'd still want to try this on a real machine before merging, but I think it's ok. The only change I'd request would be to remove the flag, and make this mode always enabled. |
|
@apaszke I've used sleep random [0.001; 0.003]s duration. Also I tested on real dataset data. Dataset have 96265 images, total 8,7G.
|
|
LGTM. I just want to run a real-world ImageNet benchmark and should be good to merge |
|
@apaszke I have no enough resources to run ImageNet benchmark. |
|
@AlexanderRadionov no worries, I wasn't requesting that from you. I was going to do it myself |
|
Just FYI, another strategy to ensuring determinacy is storing PRNG states on shared memory. It doesn't requires fixing worker assignment, though its implementation is complex and serialization / deserialization overheads are incurred. A PR to Chainer chainer/chainer#4230 uses this strategy, because fixing process assignment is impossible in Chainer as it uses |
|
@grafi-tt I'm not sure how this would work. You need to ensure that the sequence of locks for the PRNG will be the same at every run, but this is very hard or prohibitively slow if you sample multiple times while loading a data item (e.g. sample a mask, sample a rotation, ...). |
|
@apaszke It's simple: cache the state. When a worker process is resumed, a PRNG state is deserialized from shared memory (by e.g. |
|
@grafi-tt oh I see. This makes sense but is still not ideal:
|
f39fd15 to
5f13c51
Compare
|
@pytorchbot test this please |
|
@pytorchbot retest this please |
3 similar comments
|
@pytorchbot retest this please |
|
@pytorchbot retest this please |
|
@pytorchbot retest this please |
|
@pytorchbot retest this please |
* upstream/master: (663 commits) Fix "command not found" error in perf test (pytorch#5982) add pip mkl-devel to the error message when mkl is found but mkl headers are not (pytorch#5984) Support batch LowerCholeskyTransform (pytorch#5980) Linearly interpolating upsampling fix (pytorch#5927) Store perf numbers in S3 (pytorch#5951) Modidy setup docs for Windows (pytorch#5981) Group Normalization (pytorch#5968) [distributions] Implement Power transform (pytorch#5976) Disable TestBottleneck test_cuda on Windows (pytorch#5977) Fix crash when cat-ing empty cuda tensors (pytorch#5971) Update no_unions flag for nanopb gen and update ONNX proto files (pytorch#5972) Expose gradients w.r.t. input & weight for conv1d, conv2d, conv3d in Python (pytorch#5408) Fixed non-determinate preprocessing on DataLoader (pytorch#4640) add AVX2 implementation for sigmoid function (pytorch#5010) Implement torch.util.bottleneck (pytorch#5216) Remove pragma once from cpp file (pytorch#5965) fix mvn docs (pytorch#5967) Fix incorrect rendering of Tensor.index_*_ doc examples. (pytorch#5969) Implement range for loop in script (pytorch#5827) Add windows doc (pytorch#5859) ... # Conflicts: # aten/src/TH/generic/THTensorMath.c # torch/_tensor_docs.py # torch/csrc/generic/methods/TensorCompare.cwrap
|
is this on pytorch 3.1? |
|
no
…On Sun, Mar 25, 2018 at 15:32 brando90 ***@***.***> wrote:
is this on pytorch 3.1?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4640 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFaWZSbHAlDJq1OXrGRNKgni86nYYJM-ks5th_C-gaJpZM4RcoeO>
.
|
|
@ssnl then were is this? |
|
It’s on master but not in a release yet.
…On Sun, Mar 25, 2018 at 15:38 brando90 ***@***.***> wrote:
@ssnl <https://github.com/SsnL> then were is this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4640 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFaWZSXd9VYc9rgIc2Ead3p4ZGuOJaOcks5th_IygaJpZM4RcoeO>
.
|
|
@ssnl when will it be released. I'd be really nice to have determinism in the data loader. |
|
I am on mobile so i can’t link it. But you can search the issue list, there
is a tracking issue for 0.4 release. There are still some rough edges to
smooth out. Although I don’t know the exact time, but we are actively
working on it. That is why I suggested building from source on your forum
post.
It is not hard at all to modify the files as this PR by yourself either
since IIRC this PR is entirely in python.
…On Sun, Mar 25, 2018 at 15:40 brando90 ***@***.***> wrote:
@ssnl <https://github.com/SsnL> when will it be released. I'd be really
nice to have determinism in the data loader.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4640 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFaWZXCm1mq3YcwfYIr2uRixECfiQhjeks5th_KlgaJpZM4RcoeO>
.
|
|
@AlexanderRadionov @ssnl @apaszke I agree that having reproducible results is very important, but that doesn't mean that you have to synchronize workers in the dataloader. Multiprocessing are working non-deterministically in nature and if you try to have some unnatural synchronization there must be some performance trade-off whether it's trivial or not. Also the synchronization makes the code more complicated and harder to maintain. I think what you really need is a deterministic dataset, rather than a deterministic data loader. |
Also: * Single worker limitation not needed anymore, been fixed in PyTorch since v0.4.0 (pytorch/pytorch#4640) * compress_classifier.py: If run in evaluation mode (--eval), enable deterministic mode. * Call utils.set_deterministic at data loaders creation if deterministic argument is set (don't assume user calls it outside) * Disable CUDNN benchmark mode in utils.set_deterministic (https://pytorch.org/docs/stable/notes/randomness.html#cudnn)
Also: * Single worker limitation not needed anymore, been fixed in PyTorch since v0.4.0 (pytorch/pytorch#4640) * compress_classifier.py: If run in evaluation mode (--eval), enable deterministic mode. * Call utils.set_deterministic at data loaders creation if deterministic argument is set (don't assume user calls it outside) * Disable CUDNN benchmark mode in utils.set_deterministic (https://pytorch.org/docs/stable/notes/randomness.html#cudnn)
Also: * Single worker limitation not needed anymore, been fixed in PyTorch since v0.4.0 (pytorch/pytorch#4640) * compress_classifier.py: If run in evaluation mode (--eval), enable deterministic mode. * Call utils.set_deterministic at data loaders creation if deterministic argument is set (don't assume user calls it outside) * Disable CUDNN benchmark mode in utils.set_deterministic (https://pytorch.org/docs/stable/notes/randomness.html#cudnn)
Added ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate.
DataLoader in multiprocessing mode may cause non-deterministic issue. Even if
radom_seedhas frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results.
To reproduce issue you may change ind_worker_queue to
Falseand run the script several times.Code to reproduce issue.