-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Implement scatter reductions (CUDA), remove divide/subtract #41977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit d6dd7b0 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 68 times. |
aocsa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job! @v0dro, I really enjoyed reading/reviewing this PR about scatter reduce on CUDA. I put a couple of general comments and questions.
|
xla error is real, can you please skip part of the test checking for runtime error on xla? Also, rocm build interestingly times out on scatter test, so it's hard to say if it's related or not. |
|
Hm, rocm build times out on a scatter test, that's worrying. Can you disable it on rocm? |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
OK everything passing except the facebook internal tests. |
Summary: Fixes #33394 . This PR does two things: 1. Implement CUDA scatter reductions with revamped GPU atomic operations. 2. Remove support for divide and subtract for CPU reduction as was discussed with ngimel . I've also updated the docs to reflect the existence of only multiply and add. Pull Request resolved: #41977 Reviewed By: mruberry Differential Revision: D23748888 Pulled By: ngimel fbshipit-source-id: ea643c0da03c9058e433de96db02b503514c4e9c
|
Why changes to submodules were pulled as part of this PR? (namely, it reverted #44706) |
Summary:
- Bump oneDNN (mkl-dnn) to 1.6 for bug fixes
- Fixes pytorch#42446. RuntimeError: label is redefined for convolutions with large filter size on Intel AVX512
- Implemented workaround for internal compiler error when building oneDNN with Microsoft Visual Studio 2019 (pytorch#43169)
Restore pytorch#44706 which was reverted by pytorch#41977
Summary: Restore #44706, which should workaround VC compiler crash, which was reverted by #41977 Update configs to use ":stable" Windows images Pull Request resolved: #44746 Reviewed By: walterddr Differential Revision: D23793682 Pulled By: malfet fbshipit-source-id: bfdc36c35b920f58798a18c15642ec7efc68f00e
|
Is there an option to ignore/sum over several of the output dimensions? |
Noticed that `cuda_atomic_ops_test` wasn't added to `run_test.sh` and hence hasn't been running in CI since it was added in #41977. [ghstack-poisoned]
Noticed that `cuda_atomic_ops_test` wasn't added to `run_test.sh` and hence hasn't been running in CI since it was added in #41977. [ghstack-poisoned]
Noticed that `cuda_atomic_ops_test` wasn't added to `run_test.sh` and hence hasn't been running in CI since it was added in #41977. [ghstack-poisoned]
Noticed that `cuda_atomic_ops_test` wasn't added to `run_test.sh` and hence hasn't been running in CI since it was added in #41977. [ghstack-poisoned]
Fixes #33394 .
This PR does two things:
I've also updated the docs to reflect the existence of only multiply and add.