-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Use cub 1.15's latest scan-by-key algorithm to replace thrust for Embedding.cu and EmbeddingBag.cu #66580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 02e9ca2.
|
Ok, I trust it that your internal 11.6 tests are passing? |
|
@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
I run tests locally many times, both with CUDA 11.6 and with nightly cub. After merging, our nightly CI will start testing this daily with CUDA 11.6. |
|
Hey @zasdfgbnm. |
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
…edding.cu and EmbeddingBag.cu (#66580) Summary: Pull Request resolved: pytorch/pytorch#66580 Reviewed By: mruberry Differential Revision: D34116388 Pulled By: ngimel fbshipit-source-id: 2e8936ca7c10f96a8e7a5696248f56bf87290d6e (cherry picked from commit 51cff8cb1de725bca52d5137b01b16d054b95f63)
Summary: This together with #66580 and #68376 will remove all syncs in embedding. This PR includes #68376, please review after merging #68376 This PR introduces perf regressions and increases memory usage: - `exclusive_sum` is now computing the entire `numel` elements instead of `num_of_segments` elements, and the trailing `numel - num_of_segments` results will be discarded. - Some memory allocation now needs `numel` spaces instead of `num_of_segments` or `num_of_partial_segments`. These are the prices we must pay in order to get a sync-free implementation. I haven't done any benchmark yet. I will do it later. Pull Request resolved: #70943 Reviewed By: H-Huang Differential Revision: D34881660 Pulled By: ngimel fbshipit-source-id: b0760fa33608c46cd4145ceb09878bf94a9f959d
Summary: This together with #66580 and #68376 will remove all syncs in embedding. This PR includes #68376, please review after merging #68376 This PR introduces perf regressions and increases memory usage: - `exclusive_sum` is now computing the entire `numel` elements instead of `num_of_segments` elements, and the trailing `numel - num_of_segments` results will be discarded. - Some memory allocation now needs `numel` spaces instead of `num_of_segments` or `num_of_partial_segments`. These are the prices we must pay in order to get a sync-free implementation. I haven't done any benchmark yet. I will do it later. Pull Request resolved: #70943 Reviewed By: H-Huang Differential Revision: D34881660 Pulled By: ngimel fbshipit-source-id: b0760fa33608c46cd4145ceb09878bf94a9f959d (cherry picked from commit d959fa4)
No description provided.