-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Use cub::DeviceSelect::UniqueByKey for EmbeddingBackward #68376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slowFor more information, please take a look at the CI Flow Wiki. |
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 14830df (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
cub PR is merged, this is ready for review. |
|
Windows build failure seems to be real |
|
@ngimel The windows error should be fixed. It was because MSVC does not support using |
|
ping @ngimel |
|
@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: NVIDIA/cub#405 is still under review, API might change before it finally lands into cub 1.16, please wait for NVIDIA/cub#405 before merging this. Tested locally and tests pass. Pull Request resolved: #68376 Reviewed By: bdhirsh Differential Revision: D34706782 Pulled By: ngimel fbshipit-source-id: a465d39bc24354d1047af1ee85be05a1de361c86
|
Hey @zasdfgbnm. |
Summary: NVIDIA/cub#405 is still under review, API might change before it finally lands into cub 1.16, please wait for NVIDIA/cub#405 before merging this. Tested locally and tests pass. Pull Request resolved: pytorch/pytorch#68376 Reviewed By: bdhirsh Differential Revision: D34706782 Pulled By: ngimel fbshipit-source-id: a465d39bc24354d1047af1ee85be05a1de361c86 (cherry picked from commit 68a69bbc5093fd12b1fbfd561b3a10baf5d3e5ba)
Summary: NVIDIA/cub#405 is still under review, API might change before it finally lands into cub 1.16, please wait for NVIDIA/cub#405 before merging this. Tested locally and tests pass. Pull Request resolved: pytorch/pytorch#68376 Reviewed By: bdhirsh Differential Revision: D34706782 Pulled By: ngimel fbshipit-source-id: a465d39bc24354d1047af1ee85be05a1de361c86 (cherry picked from commit 68a69bbc5093fd12b1fbfd561b3a10baf5d3e5ba)
Summary: This together with #66580 and #68376 will remove all syncs in embedding. This PR includes #68376, please review after merging #68376 This PR introduces perf regressions and increases memory usage: - `exclusive_sum` is now computing the entire `numel` elements instead of `num_of_segments` elements, and the trailing `numel - num_of_segments` results will be discarded. - Some memory allocation now needs `numel` spaces instead of `num_of_segments` or `num_of_partial_segments`. These are the prices we must pay in order to get a sync-free implementation. I haven't done any benchmark yet. I will do it later. Pull Request resolved: #70943 Reviewed By: H-Huang Differential Revision: D34881660 Pulled By: ngimel fbshipit-source-id: b0760fa33608c46cd4145ceb09878bf94a9f959d
Summary: This together with #66580 and #68376 will remove all syncs in embedding. This PR includes #68376, please review after merging #68376 This PR introduces perf regressions and increases memory usage: - `exclusive_sum` is now computing the entire `numel` elements instead of `num_of_segments` elements, and the trailing `numel - num_of_segments` results will be discarded. - Some memory allocation now needs `numel` spaces instead of `num_of_segments` or `num_of_partial_segments`. These are the prices we must pay in order to get a sync-free implementation. I haven't done any benchmark yet. I will do it later. Pull Request resolved: #70943 Reviewed By: H-Huang Differential Revision: D34881660 Pulled By: ngimel fbshipit-source-id: b0760fa33608c46cd4145ceb09878bf94a9f959d (cherry picked from commit d959fa4)
NVIDIA/cub#405 is still under review, API might change before it finally lands into cub 1.16, please wait for NVIDIA/cub#405 before merging this. Tested locally and tests pass.