-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Update determinism documentation #41692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update determinism documentation #41692
Conversation
d2ebeb8 to
f2c9443
Compare
💊 CI failures summary and remediationsAs of commit 19e9a1f (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
f2c9443 to
b8d9356
Compare
docs/source/notes/randomness.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to what we expect the user to know, would most users know they're using cuDNN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed "cuDNN's convolution" to be just "CUDA's convolution". Do you think that's clear enough?
docs/source/torch.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not being very familiar with the history here, but is there already something that will generate the doc for set_deterministic and is_deterministic once they're added to the rst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but I'm not aware if so. They don't get automatically added to the generated doc associated with this rst file if I remove these two lines.
|
Hey @kurtamohler, I appreciate you're trying to improve the current Reproducibility note, which hasn't had a significant update in over a long time. I think it's going to be a little tricky to update, however, because it is so out of date. I'm also not sure it was ever accomplishing its goal. That is, the note starts by warning users not to expect determinism:
And that's reasonable. But then it says this:
And I don't think this is true, right?. That is, if a user performed the actions in the doc they might not actually get a deterministic computation. It would require our own native ops, MKL algorithms, distributed algorithms, cuDNN algorithms (even with the cuDNN deterministic flags enabled), etc., actually be deterministic, but as the note itself discusses later this isn't always the case. So my first questions are:
It seems like after #41538 many native non-deterministic algorithms will correctly throw an error if the deterministic flag is set, and after #41377 some cuDNN algorithms will, too, but are we confident we've identified all the nondeterministic native operations and do we know if the algorithms we use from math libraries, like oneDNN (MKL) or MAGMA or LAPACK are deterministic or not? (LAPACK should be, I'm not sure about MAGMA). If not, what claims can we make today? Looking forward to hearing your thoughts. We should probably also use this revision of the Reproducibility note to correct inconsistencies like "There are two pseudorandom number generators involved in PyTorch, which you will need to seed manually to make runs reproducible... You can use torch.manual_seed() to seed the RNG for all devices (both CPU and CUDA)." |
|
It looks like I missed @mruberry's comment here when I wrote my comment at #15359 (comment) which hashes out similar things. |
0620265 to
48c87ba
Compare
48c87ba to
f83bed2
Compare
|
I believe I've addressed all the major points brought up in the discussions about determinism. A few changes are:
|
f83bed2 to
87de193
Compare
2320775 to
ce565dc
Compare
docs/source/notes/randomness.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment above is basically suggesting this paragraph be pulled up.
|
Hey @kurtamohler, I had an opportunity to read the latest and I think it's really good and a huge improvement over the existing doc. I made a few more suggestions but they're minor and mostly about reordering the existing paragraphs or making slight grammar changes. Let me know your thoughts and then, unless you'd like to make more changes, I think we should ship this. |
ce565dc to
729822f
Compare
|
Thanks @mruberry , I've made those last few changes |
docs/source/notes/randomness.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we kill this paragraph? The value is True by default, right, so users don't actually need to do anything?
If we don't kill it, instead of saying "beneficial" can we clarify by saying "performance might improve"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, looks like it's False by default. I can change it to "performance might improve".
docs/source/notes/randomness.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is missing something. Something like, "is called with sparse-dense CUDA tensors it typically uses a non-deterministic algorithm, but..."
|
Just two minor issues with the updated language. |
729822f to
19e9a1f
Compare
mruberry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…4140) Summary: This PR fulfills Ed's request (#41692 (comment)) for a strategy to keep the functions that have nondeterministic alerts fully documented. Part of #15359 Pull Request resolved: #44140 Reviewed By: colesbury Differential Revision: D23644469 Pulled By: ezyang fbshipit-source-id: 60936ccced13f071c620f7d25ef6dcbca338de7f
…4140) Summary: This PR fulfills Ed's request (#41692 (comment)) for a strategy to keep the functions that have nondeterministic alerts fully documented. Part of #15359 Pull Request resolved: #44140 Reviewed By: colesbury Differential Revision: D23644469 Pulled By: ezyang fbshipit-source-id: 60936ccced13f071c620f7d25ef6dcbca338de7f
Add user-facing documentation for set_deterministic
Also update grammar and readability in Reproducibility page
Issue #15359