Use cub::DeviceSelect::UniqueByKey for EmbeddingBackward #68376

zasdfgbnm · 2021-11-15T19:45:01Z

NVIDIA/cub#405 is still under review, API might change before it finally lands into cub 1.16, please wait for NVIDIA/cub#405 before merging this. Tested locally and tests pass.

pytorch-probot · 2021-11-15T19:45:04Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/5b6a8258b89970247b6141a23d008abde1b1fba9/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-11-15T19:45:07Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/68376
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 14830df (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…e-by-key

zasdfgbnm · 2022-01-24T19:23:32Z

cub PR is merged, this is ready for review.

ngimel · 2022-02-09T20:07:48Z

Windows build failure seems to be real

zasdfgbnm · 2022-02-10T06:11:24Z

@ngimel The windows error should be fixed. It was because MSVC does not support using #if inside a macro argument.

zasdfgbnm · 2022-03-08T02:22:28Z

ping @ngimel

facebook-github-bot · 2022-03-08T03:00:16Z

@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: NVIDIA/cub#405 is still under review, API might change before it finally lands into cub 1.16, please wait for NVIDIA/cub#405 before merging this. Tested locally and tests pass. Pull Request resolved: #68376 Reviewed By: bdhirsh Differential Revision: D34706782 Pulled By: ngimel fbshipit-source-id: a465d39bc24354d1047af1ee85be05a1de361c86

github-actions · 2022-03-08T22:36:05Z

Hey @zasdfgbnm.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: NVIDIA/cub#405 is still under review, API might change before it finally lands into cub 1.16, please wait for NVIDIA/cub#405 before merging this. Tested locally and tests pass. Pull Request resolved: pytorch/pytorch#68376 Reviewed By: bdhirsh Differential Revision: D34706782 Pulled By: ngimel fbshipit-source-id: a465d39bc24354d1047af1ee85be05a1de361c86 (cherry picked from commit 68a69bbc5093fd12b1fbfd561b3a10baf5d3e5ba)

Summary: This together with #66580 and #68376 will remove all syncs in embedding. This PR includes #68376, please review after merging #68376 This PR introduces perf regressions and increases memory usage: - `exclusive_sum` is now computing the entire `numel` elements instead of `num_of_segments` elements, and the trailing `numel - num_of_segments` results will be discarded. - Some memory allocation now needs `numel` spaces instead of `num_of_segments` or `num_of_partial_segments`. These are the prices we must pay in order to get a sync-free implementation. I haven't done any benchmark yet. I will do it later. Pull Request resolved: #70943 Reviewed By: H-Huang Differential Revision: D34881660 Pulled By: ngimel fbshipit-source-id: b0760fa33608c46cd4145ceb09878bf94a9f959d

Summary: This together with #66580 and #68376 will remove all syncs in embedding. This PR includes #68376, please review after merging #68376 This PR introduces perf regressions and increases memory usage: - `exclusive_sum` is now computing the entire `numel` elements instead of `num_of_segments` elements, and the trailing `numel - num_of_segments` results will be discarded. - Some memory allocation now needs `numel` spaces instead of `num_of_segments` or `num_of_partial_segments`. These are the prices we must pay in order to get a sync-free implementation. I haven't done any benchmark yet. I will do it later. Pull Request resolved: #70943 Reviewed By: H-Huang Differential Revision: D34881660 Pulled By: ngimel fbshipit-source-id: b0760fa33608c46cd4145ceb09878bf94a9f959d (cherry picked from commit d959fa4)

Use cub::DeviceSelect::UniqueByKey for EmbeddingBackward

a113fba

pytorch-probot bot added the ciflow/default label Nov 15, 2021

facebook-github-bot added the cla signed label Nov 15, 2021

pytorchbot added the open source label Nov 15, 2021

zasdfgbnm requested a review from ngimel November 15, 2021 19:50

zasdfgbnm added 3 commits November 15, 2021 12:09

thrust::discard_iterator

8b21b1a

Merge branch 'master' into unique-by-key

f6f05d0

Merge branch 'master' of github.com:pytorch/pytorch into unique-by-key

5b6a825

zasdfgbnm mentioned this pull request Jan 6, 2022

Remove sync in embedding #70943

Closed

zasdfgbnm added 3 commits January 24, 2022 11:20

Merge branch 'master' of github.com:pytorch/pytorch into unique-by-key

9d8a856

Merge branch 'unique-by-key' of github.com:pytorch/pytorch into uniqu…

a57ff86

…e-by-key

save

8a38207

zasdfgbnm marked this pull request as ready for review January 24, 2022 19:23

anjali411 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 25, 2022

zasdfgbnm added 3 commits February 9, 2022 18:06

fix msvc

8b4bf92

Merge branch 'master' of github.com:pytorch/pytorch into unique-by-key

d689c04

Merge branch 'master' of github.com:pytorch/pytorch into unique-by-key

92cacff

Merge branch 'master' of github.com:pytorch/pytorch into unique-by-key

14830df

ngimel added the ciflow/cuda label Mar 8, 2022

ngimel approved these changes Mar 8, 2022

View reviewed changes

pytorchmergebot closed this in ee8d7d8 Mar 8, 2022

zasdfgbnm deleted the unique-by-key branch March 8, 2022 22:39

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use cub::DeviceSelect::UniqueByKey for EmbeddingBackward #68376

Use cub::DeviceSelect::UniqueByKey for EmbeddingBackward #68376

Uh oh!

zasdfgbnm commented Nov 15, 2021

Uh oh!

pytorch-probot bot commented Nov 15, 2021 •

edited

Loading

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Nov 15, 2021 •

edited

Loading

Uh oh!

zasdfgbnm commented Jan 24, 2022

Uh oh!

ngimel commented Feb 9, 2022

Uh oh!

zasdfgbnm commented Feb 10, 2022 •

edited

Loading

Uh oh!

zasdfgbnm commented Mar 8, 2022

Uh oh!

facebook-github-bot commented Mar 8, 2022

Uh oh!

github-actions bot commented Mar 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Use cub::DeviceSelect::UniqueByKey for EmbeddingBackward #68376

Use cub::DeviceSelect::UniqueByKey for EmbeddingBackward #68376

Uh oh!

Conversation

zasdfgbnm commented Nov 15, 2021

Uh oh!

pytorch-probot bot commented Nov 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Nov 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

zasdfgbnm commented Jan 24, 2022

Uh oh!

ngimel commented Feb 9, 2022

Uh oh!

zasdfgbnm commented Feb 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zasdfgbnm commented Mar 8, 2022

Uh oh!

facebook-github-bot commented Mar 8, 2022

Uh oh!

github-actions bot commented Mar 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pytorch-probot bot commented Nov 15, 2021 •

edited

Loading

facebook-github-bot commented Nov 15, 2021 •

edited

Loading

zasdfgbnm commented Feb 10, 2022 •

edited

Loading