Skip to content

Conversation

@kurtamohler
Copy link
Collaborator

Add user-facing documentation for set_deterministic
Also update grammar and readability in Reproducibility page

Issue #15359

@kurtamohler kurtamohler requested review from ezyang and t-vi July 20, 2020 19:34
@kurtamohler kurtamohler force-pushed the deterministic-flag-docs-15359 branch from d2ebeb8 to f2c9443 Compare July 20, 2020 19:38
@dr-ci
Copy link

dr-ci bot commented Jul 20, 2020

💊 CI failures summary and remediations

As of commit 19e9a1f (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_6_clang9_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Aug 31 20:41:29 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/workspace/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: use of undeclared identifier \'strtod_l\'\n return ((int*)(&strtod_l))[argc];\n ^\n1 error generated.\n" }
Aug 31 20:41:29 + [[ pytorch-linux-bionic-py3.6-clang9-build != *xla* ]] 
Aug 31 20:41:29 ++ git status --porcelain 
Aug 31 20:41:29 + git_status= 
Aug 31 20:41:29 + [[ -n '' ]] 
Aug 31 20:41:29 + [[ pytorch-linux-bionic-py3.6-clang9-build == *xla* ]] 
Aug 31 20:41:29 + cleanup 
Aug 31 20:41:29 + retcode=0 
Aug 31 20:41:29 + set +x 
Aug 31 20:41:29 EXITED_USER_LAND 
Aug 31 20:41:29 =================== sccache compilation log =================== 
Aug 31 20:41:29 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/workspace/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: use of undeclared identifier \'strtod_l\'\n  return ((int*)(&strtod_l))[argc];\n                  ^\n1 error generated.\n" } 
Aug 31 20:41:29  
Aug 31 20:41:29 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Aug 31 20:41:29 Compile requests              6109 
Aug 31 20:41:29 Compile requests executed     3613 
Aug 31 20:41:29 Cache hits                    3597 
Aug 31 20:41:29 Cache misses                     0 
Aug 31 20:41:29 Cache timeouts                   0 
Aug 31 20:41:29 Cache read errors                0 
Aug 31 20:41:29 Forced recaches                  0 
Aug 31 20:41:29 Cache write errors               0 

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build binary_windows_libtorch_3_7_cpu_debug_build (1/1)

Step: "Checkout code" (full log | diagnosis details | 🔁 rerun) ❄️

Writing SSH key for checkout to id_rsa
Creating .ssh directory
Adding the following entries to known_hosts:
github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
bitbucket.org ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAubiN81eDcafrgMeLzaFPsw2kNvEcqTKl/VqLat/MaB33pZy0y3rJZtnqwR2qOOvbwKZYKiEO1O6VqNEBxKvJJelCq0dTXWT5pbO2gDXC6h6QDXCaHo6pOHGPUy+YBaGQRGuSusMEASYiWunYN0vCAI8QaXnWMXNMdFP3jHAJH0eDsoiGnLPBlBp4TNm6rYI74nMzgz3B9IikW4WVK+dc8KZJZWYjAuORU3jc1c/NPskD2ASinf8v3xnfXeukU0sJ5N6m5E8VLjObPEO+mN2t/FZTMZLiFqPWc/ALSqnMnnhwrNi2rbfg/rd/IpL8Le3pSBne8+seeFVBoGqzHM9yXw==

Writing SSH key for checkout to id_rsa

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 51 times.

@gchanan gchanan requested a review from mruberry July 21, 2020 01:26
@gchanan gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 21, 2020
@kurtamohler kurtamohler force-pushed the deterministic-flag-docs-15359 branch from f2c9443 to b8d9356 Compare July 22, 2020 00:10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back to what we expect the user to know, would most users know they're using cuDNN?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed "cuDNN's convolution" to be just "CUDA's convolution". Do you think that's clear enough?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not being very familiar with the history here, but is there already something that will generate the doc for set_deterministic and is_deterministic once they're added to the rst?

Copy link
Collaborator Author

@kurtamohler kurtamohler Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but I'm not aware if so. They don't get automatically added to the generated doc associated with this rst file if I remove these two lines.

@mruberry
Copy link
Collaborator

mruberry commented Jul 22, 2020

Hey @kurtamohler, I appreciate you're trying to improve the current Reproducibility note, which hasn't had a significant update in over a long time. I think it's going to be a little tricky to update, however, because it is so out of date. I'm also not sure it was ever accomplishing its goal. That is, the note starts by warning users not to expect determinism:

Completely reproducible results are not guaranteed across PyTorch releases, individual commits or different platforms. Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds.

And that's reasonable. But then it says this:

However, in order to make computations deterministic on your specific problem on one specific platform and PyTorch release, there are a couple of steps to take.

And I don't think this is true, right?. That is, if a user performed the actions in the doc they might not actually get a deterministic computation. It would require our own native ops, MKL algorithms, distributed algorithms, cuDNN algorithms (even with the cuDNN deterministic flags enabled), etc., actually be deterministic, but as the note itself discusses later this isn't always the case.

So my first questions are:

  • What do we want to tell users about determinism in PyTorch?
  • What kind of determinism, if any, is achievable for end-users and how do we expect them to get it?

It seems like after #41538 many native non-deterministic algorithms will correctly throw an error if the deterministic flag is set, and after #41377 some cuDNN algorithms will, too, but are we confident we've identified all the nondeterministic native operations and do we know if the algorithms we use from math libraries, like oneDNN (MKL) or MAGMA or LAPACK are deterministic or not? (LAPACK should be, I'm not sure about MAGMA). If not, what claims can we make today?

Looking forward to hearing your thoughts. We should probably also use this revision of the Reproducibility note to correct inconsistencies like "There are two pseudorandom number generators involved in PyTorch, which you will need to seed manually to make runs reproducible... You can use torch.manual_seed() to seed the RNG for all devices (both CPU and CUDA)."

@ezyang
Copy link
Contributor

ezyang commented Jul 22, 2020

It looks like I missed @mruberry's comment here when I wrote my comment at #15359 (comment) which hashes out similar things.

@kurtamohler kurtamohler marked this pull request as draft August 6, 2020 00:51
@kurtamohler kurtamohler force-pushed the deterministic-flag-docs-15359 branch from 0620265 to 48c87ba Compare August 18, 2020 19:41
@kurtamohler kurtamohler changed the title Add documentation for set_deterministic Update determinism documentation Aug 18, 2020
@kurtamohler kurtamohler marked this pull request as ready for review August 18, 2020 19:57
@kurtamohler kurtamohler force-pushed the deterministic-flag-docs-15359 branch from 48c87ba to f83bed2 Compare August 18, 2020 20:06
@kurtamohler
Copy link
Collaborator Author

I believe I've addressed all the major points brought up in the discussions about determinism. A few changes are:

  • Don't claim that we offer full determinism
  • Provide a list of all operations affected by torch.set_deterministic()
  • Remove/replace a few details in randomness.rst that are not part of the user-facing API

@kurtamohler kurtamohler force-pushed the deterministic-flag-docs-15359 branch from f83bed2 to 87de193 Compare August 18, 2020 21:49
@ezyang
Copy link
Contributor

ezyang commented Aug 19, 2020

This looks good. However, I'll give @mruberry and @gchanan a chance to take a look.

@kurtamohler kurtamohler force-pushed the deterministic-flag-docs-15359 branch 3 times, most recently from 2320775 to ce565dc Compare August 28, 2020 22:43
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment above is basically suggesting this paragraph be pulled up.

@mruberry
Copy link
Collaborator

Hey @kurtamohler, I had an opportunity to read the latest and I think it's really good and a huge improvement over the existing doc. I made a few more suggestions but they're minor and mostly about reordering the existing paragraphs or making slight grammar changes. Let me know your thoughts and then, unless you'd like to make more changes, I think we should ship this.

@kurtamohler kurtamohler force-pushed the deterministic-flag-docs-15359 branch from ce565dc to 729822f Compare August 31, 2020 20:06
@kurtamohler
Copy link
Collaborator Author

Thanks @mruberry , I've made those last few changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we kill this paragraph? The value is True by default, right, so users don't actually need to do anything?

If we don't kill it, instead of saying "beneficial" can we clarify by saying "performance might improve"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looks like it's False by default. I can change it to "performance might improve".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is missing something. Something like, "is called with sparse-dense CUDA tensors it typically uses a non-deterministic algorithm, but..."

@mruberry
Copy link
Collaborator

Just two minor issues with the updated language.

@kurtamohler kurtamohler force-pushed the deterministic-flag-docs-15359 branch from 729822f to 19e9a1f Compare August 31, 2020 20:23
@mruberry mruberry self-requested a review August 31, 2020 20:25
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in d7ee84c.

facebook-github-bot pushed a commit that referenced this pull request Sep 14, 2020
…4140)

Summary:
This PR fulfills Ed's request (#41692 (comment)) for a strategy to keep the functions that have nondeterministic alerts fully documented.

Part of #15359

Pull Request resolved: #44140

Reviewed By: colesbury

Differential Revision: D23644469

Pulled By: ezyang

fbshipit-source-id: 60936ccced13f071c620f7d25ef6dcbca338de7f
xuzhao9 pushed a commit that referenced this pull request Sep 18, 2020
…4140)

Summary:
This PR fulfills Ed's request (#41692 (comment)) for a strategy to keep the functions that have nondeterministic alerts fully documented.

Part of #15359

Pull Request resolved: #44140

Reviewed By: colesbury

Differential Revision: D23644469

Pulled By: ezyang

fbshipit-source-id: 60936ccced13f071c620f7d25ef6dcbca338de7f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants