Implement scatter reductions (CUDA), remove divide/subtract #41977

v0dro · 2020-07-24T05:09:00Z

Fixes #33394 .

This PR does two things:

Implement CUDA scatter reductions with revamped GPU atomic operations.
Remove support for divide and subtract for CPU reduction as was discussed with @ngimel .

I've also updated the docs to reflect the existence of only multiply and add.

…ce-cuda

dr-ci · 2020-07-24T06:18:29Z

💊 CI failures summary and remediations

As of commit d6dd7b0 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 68 times.

aocsa

Nice job! @v0dro, I really enjoyed reading/reviewing this PR about scatter reduce on CUDA. I put a couple of general comments and questions.

aten/src/ATen/native/cuda/AtomicOps.cuh

…ce-cuda

aten/src/ATen/native/cuda/AtomicOps.cuh

…ce-cuda

ngimel · 2020-09-12T03:06:54Z

xla error is real, can you please skip part of the test checking for runtime error on xla? Also, rocm build interestingly times out on scatter test, so it's hard to say if it's related or not.

ngimel · 2020-09-16T20:38:22Z

Hm, rocm build times out on a scatter test, that's worrying. Can you disable it on rocm?

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

v0dro · 2020-09-17T04:18:52Z

OK everything passing except the facebook internal tests.

facebook-github-bot · 2020-09-17T08:13:47Z

@ngimel merged this pull request in e18a221.

Summary: Fixes #33394 . This PR does two things: 1. Implement CUDA scatter reductions with revamped GPU atomic operations. 2. Remove support for divide and subtract for CPU reduction as was discussed with ngimel . I've also updated the docs to reflect the existence of only multiply and add. Pull Request resolved: #41977 Reviewed By: mruberry Differential Revision: D23748888 Pulled By: ngimel fbshipit-source-id: ea643c0da03c9058e433de96db02b503514c4e9c

malfet · 2020-09-18T23:42:42Z

Why changes to submodules were pulled as part of this PR? (namely, it reverted #44706)

Summary: - Bump oneDNN (mkl-dnn) to 1.6 for bug fixes - Fixes pytorch#42446. RuntimeError: label is redefined for convolutions with large filter size on Intel AVX512 - Implemented workaround for internal compiler error when building oneDNN with Microsoft Visual Studio 2019 (pytorch#43169) Restore pytorch#44706 which was reverted by pytorch#41977

Summary: Restore #44706, which should workaround VC compiler crash, which was reverted by #41977 Update configs to use ":stable" Windows images Pull Request resolved: #44746 Reviewed By: walterddr Differential Revision: D23793682 Pulled By: malfet fbshipit-source-id: bfdc36c35b920f58798a18c15642ec7efc68f00e

Summary: Revert accidental gloo submodule changes in #41977 Pull Request resolved: #45008 Reviewed By: malfet Differential Revision: D23799892 Pulled By: ngimel fbshipit-source-id: e8dab244c6abad32ed60efe3c26cab40837e57c8

sbb-gh · 2020-11-03T20:47:47Z

Is there an option to ignore/sum over several of the output dimensions?
https://discuss.pytorch.org/t/scatter-add-reduce-output-dimensions-shape/100427

Noticed that `cuda_atomic_ops_test` wasn't added to `run_test.sh` and hence hasn't been running in CI since it was added in #41977. [ghstack-poisoned]

v0dro added 16 commits July 10, 2020 23:25

tensor advanced indexing updates

4140827

CUDA kernel scaffolding

03eb15f

update reduction kernels

5b1036a

functor implementation works

4b6061a

start adding new THC atomics file

e0894dd

start working on brand new ATEN scatter CUDA GPU atomics implementation:

8160070

addition atomic operation

3c3a3da

atomics ops template specializatio works for basic add stuff

add739a

update atomic operations with addtion and subtraction

289d344

update atomic ops with multiplciation skeleton

cf4e4c6

Merge branch 'master' of github.com:pytorch/pytorch into scatter-redu…

d74202f

…ce-cuda

update templates to accept function specializations

7b45ddd

update kernel names

b0cb716

remove subtract and divide scatter operations

e20526d

finish CUDA scatter reduction

c693658

Merge branch 'master' of github.com:pytorch/pytorch into scatter-redu…

3808f78

…ce-cuda

pytorchbot added the open source label Jul 24, 2020

rgommers requested a review from aocsa July 24, 2020 10:02

aocsa reviewed Jul 24, 2020

View reviewed changes

aten/src/ATen/native/cuda/AtomicOps.cuh Outdated Show resolved Hide resolved

aten/src/ATen/native/cuda/AtomicOps.cuh Outdated Show resolved Hide resolved

Merge branch 'master' of github.com:pytorch/pytorch into scatter-redu…

3c04686

…ce-cuda

rgommers mentioned this pull request Jul 25, 2020

Non-deterministic scatter reduction algorithms for scatter operations for CUDA (sum, subtract, divide, multiply). #33394

Closed

rgommers changed the title ~~Scatter reduce cuda~~ Implement scatter reductions (CUDA), remove divide/subtract Jul 25, 2020

ngimel self-requested a review July 26, 2020 18:17

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 26, 2020

Merge branch 'master' of github.com:pytorch/pytorch into scatter-redu…

8aeeebf

…ce-cuda

ngimel reviewed Jul 31, 2020

View reviewed changes

aten/src/ATen/native/cuda/AtomicOps.cuh Outdated Show resolved Hide resolved

aten/src/ATen/native/cuda/AtomicOps.cuh Outdated Show resolved Hide resolved

aten/src/ATen/native/cuda/AtomicOps.cuh Outdated Show resolved Hide resolved

v0dro added 3 commits July 30, 2020 20:38

add tests for CUDA atomic operations

531a4b0

update tests to cover more types

4870be8

correct tests to factor multiple data types

b4ab66e

v0dro added 2 commits September 10, 2020 23:32

Merge branch 'master' of github.com:pytorch/pytorch into scatter-redu…

338b6bd

…ce-cuda

update test for scatter reduce

ed7e399

v0dro added 3 commits September 15, 2020 21:46

Merge master

0cbc2bf

update test to skip ROCM if CUDA

9ec827b

only run on CPU and CUDA

b15b3c8

skip ROCM tests

d6dd7b0

facebook-github-bot reviewed Sep 17, 2020

View reviewed changes

ngimel approved these changes Sep 17, 2020

View reviewed changes

facebook-github-bot closed this in e18a221 Sep 17, 2020

facebook-github-bot added the merged label Sep 17, 2020

malfet mentioned this pull request Sep 18, 2020

Update Windows builders to latest VS2019 #44746

Closed

ngimel mentioned this pull request Sep 19, 2020

update gloo submodule #45008

Closed

v0dro mentioned this pull request Sep 19, 2020

scatter_ supporting different reduction modes #22378

Closed

mruberry added the Merged label Oct 28, 2020

mikaylagawarecki mentioned this pull request Mar 22, 2022

Add cuda_atomic_ops_test to run_tests.sh #74482

Closed

mikaylagawarecki added a commit that referenced this pull request Mar 23, 2022

Update on "Add cuda_atomic_ops_test to run_tests.sh"

0aec3a1

Noticed that `cuda_atomic_ops_test` wasn't added to `run_test.sh` and hence hasn't been running in CI since it was added in #41977. [ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Mar 24, 2022

Update on "Add cuda_atomic_ops_test to run_tests.sh"

3383afa

Noticed that `cuda_atomic_ops_test` wasn't added to `run_test.sh` and hence hasn't been running in CI since it was added in #41977. [ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Mar 24, 2022

Update on "Add cuda_atomic_ops_test to run_tests.sh"

2510726

Noticed that `cuda_atomic_ops_test` wasn't added to `run_test.sh` and hence hasn't been running in CI since it was added in #41977. [ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement scatter reductions (CUDA), remove divide/subtract #41977

Implement scatter reductions (CUDA), remove divide/subtract #41977

Uh oh!

v0dro commented Jul 24, 2020

Uh oh!

dr-ci bot commented Jul 24, 2020 •

edited

Loading

Uh oh!

aocsa left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngimel commented Sep 12, 2020

Uh oh!

ngimel commented Sep 16, 2020

Uh oh!

facebook-github-bot left a comment

Uh oh!

v0dro commented Sep 17, 2020

Uh oh!

facebook-github-bot commented Sep 17, 2020

Uh oh!

malfet commented Sep 18, 2020

Uh oh!

sbb-gh commented Nov 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Implement scatter reductions (CUDA), remove divide/subtract #41977

Implement scatter reductions (CUDA), remove divide/subtract #41977

Uh oh!

Conversation

v0dro commented Jul 24, 2020

Uh oh!

dr-ci bot commented Jul 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

aocsa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngimel commented Sep 12, 2020

Uh oh!

ngimel commented Sep 16, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

v0dro commented Sep 17, 2020

Uh oh!

facebook-github-bot commented Sep 17, 2020

Uh oh!

malfet commented Sep 18, 2020

Uh oh!

sbb-gh commented Nov 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

dr-ci bot commented Jul 24, 2020 •

edited

Loading