-
Notifications
You must be signed in to change notification settings - Fork 552
3580 bug investigate test failures when running with cuda 126 #3588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
christophe-murphy
merged 7 commits into
master
from
3580-bug-investigate-test-failures-when-running-with-cuda-126
Jan 9, 2025
Merged
3580 bug investigate test failures when running with cuda 126 #3588
christophe-murphy
merged 7 commits into
master
from
3580-bug-investigate-test-failures-when-running-with-cuda-126
Jan 9, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…rp primitives and calls the new primitives for CUDA versions greater than 9 and the old ones for older CUDA versions. The new primitives have an additional argument which is a mask of the warp threads that are participating in the operation. The old primitives always involve all the threads in a warp. The wrapper routines originally allowed you to specify the mask which was ignored for the old primitives but this has now been removed. This is because if an old version of CUDA is being used then all threads must enter the wrapper routine and if a new version of CUDA is being used only the threads corresponding to the mask must enter. If threads outside the mask enter the routine then the behavior is undefined. In CUDA versions <=12.2 the primitive executes without any errors given however in later versions of CUDA a warp illegal instruction exception will be thrown. In order to preserve the same behavior of these wrapper functions for old and new versions of CUDA, the mask is always set to all threads in a warp for the new primitives. The specific new primitive can always be called with a custom mask which is already done elsewhere in the reduce_by_key routine.
… cusolver ormqr routine call which was causing memory errors.
…workspace size was not being calculated for the cusolver ormqr routine.
…t type to ensure all tests pass.
2 tasks
christophe-murphy
pushed a commit
that referenced
this pull request
Aug 30, 2024
Note that this will be superseded by #3588
2 tasks
5 tasks
FloopCZ
reviewed
Dec 6, 2024
Contributor
FloopCZ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thank you for digging into this!
Contributor
|
This patch is now a part of the ArrayFire Arch Linux repository package. |
edwinsolisf
reviewed
Jan 2, 2025
Contributor
edwinsolisf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on Windows 11, passed all tests
Update driver versions to minimum required. Co-authored-by: Filip Matzner <FloopCZ@users.noreply.github.com>
…nning-with-cuda-126
edwinsolisf
approved these changes
Jan 9, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes for bugs uncovered by CUDA version 12.6
Description
Fixes: #3580
Checklist