-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[CUDA] Only use vec128 if CUDA version is newer than 12.8 #150705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -486,7 +486,9 @@ inline C10_HOST_DEVICE int can_vectorize_up_to(const char *pointer) { | |
| uint64_t address = reinterpret_cast<uint64_t>(pointer); | ||
| constexpr int vec2_alignment = std::alignment_of_v<aligned_vector<scalar_t, 2>>; | ||
| constexpr int vec4_alignment = std::alignment_of_v<aligned_vector<scalar_t, 4>>; | ||
| #if defined(USE_ROCM) || (defined(CUDA_VERSION) && CUDA_VERSION >= 12080) | ||
| constexpr int vec8_alignment = std::alignment_of_v<aligned_vector<scalar_t, 8>>; | ||
| #endif | ||
| #ifdef USE_ROCM | ||
| constexpr int vec16_alignment = std::alignment_of_v<aligned_vector<scalar_t, 16>>; | ||
| constexpr int type_size = sizeof(scalar_t); | ||
|
|
@@ -495,7 +497,7 @@ inline C10_HOST_DEVICE int can_vectorize_up_to(const char *pointer) { | |
| } else if (type_size <= 2 && (address % vec8_alignment == 0)) { | ||
| return 8; | ||
| } else | ||
| #else | ||
| #elif defined(CUDA_VERSION) && CUDA_VERSION >= 12080 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't there be some logic to handle the case when CUDA_VERSION < 12080?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @ZainRizvi this is basically redoing #145746 only if CUDA >= 12.8
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hence this code should not be applied by default but only for CUDA 12.8+ |
||
| if (address % vec8_alignment == 0) { | ||
| return 8; | ||
| } else | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should
USE_ROCMhere also be inverted if theCUDA_VERSIONcondition is>= 12080There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think so. Before #145746 vec8_alignment were only available to
USE_ROCM, after it was enabled unconditionally and I want it to be enabled for either ROCM or CUDA newer than 12.6