3580 bug investigate test failures when running with cuda 126 by christophe-murphy · Pull Request #3588 · arrayfire/arrayfire

christophe-murphy · 2024-08-21T00:10:44Z

Fixes for bugs uncovered by CUDA version 12.6

Description

Merge in fixes from 3575 bug shfl down sync bug causes undefined behavior #3576 for shuffle sync bugs
Fixed bugs in calls to ormqr routine in the qr and leastSquares where the correct workspace size was not being calculated
Loosened tolerance on the convolution filter tests for the floating point type from 2.0e-3 to 4.0e-3. The convolution routines appear to be working correctly but produce slightly different results than older CUDA versions. Not sure why.
All CUDA tests pass now for CUDA version 12.6

Checklist

[x ] Rebased on latest master
[x ] Code compiles
[x ] Tests pass
[x ] Functions added to unified API
[x ] Functions documented

…rs up to 12.6

…rp primitives and calls the new primitives for CUDA versions greater than 9 and the old ones for older CUDA versions. The new primitives have an additional argument which is a mask of the warp threads that are participating in the operation. The old primitives always involve all the threads in a warp. The wrapper routines originally allowed you to specify the mask which was ignored for the old primitives but this has now been removed. This is because if an old version of CUDA is being used then all threads must enter the wrapper routine and if a new version of CUDA is being used only the threads corresponding to the mask must enter. If threads outside the mask enter the routine then the behavior is undefined. In CUDA versions <=12.2 the primitive executes without any errors given however in later versions of CUDA a warp illegal instruction exception will be thrown. In order to preserve the same behavior of these wrapper functions for old and new versions of CUDA, the mask is always set to all threads in a warp for the new primitives. The specific new primitive can always be called with a custom mask which is already done elsewhere in the reduce_by_key routine.

… cusolver ormqr routine call which was causing memory errors.

…workspace size was not being calculated for the cusolver ormqr routine.

…t type to ensure all tests pass.

Note that this will be superseded by #3588

FloopCZ

Hi, thank you for digging into this!

src/backend/cuda/device_manager.cpp

FloopCZ · 2024-12-07T20:24:07Z

This patch is now a part of the ArrayFire Arch Linux repository package.

edwinsolisf

Tested on Windows 11, passed all tests

Update driver versions to minimum required. Co-authored-by: Filip Matzner <FloopCZ@users.noreply.github.com>

…nning-with-cuda-126

christophe-murphy added 5 commits August 5, 2024 14:15

Update CUDA device manager structs for new versions of CUDA and drive…

995c083

…rs up to 12.6

Fix for bug where new workspace size was not being calculated for the…

18e7801

… cusolver ormqr routine call which was causing memory errors.

Fix for similar bug in the least squares solve routine where the new …

c97f942

…workspace size was not being calculated for the cusolver ormqr routine.

Loosened tolerance for convolution filter tests for the floating poin…

ee07062

…t type to ensure all tests pass.

christophe-murphy linked an issue Aug 21, 2024 that may be closed by this pull request

[BUG] Investigate test failures when running with CUDA 12.6 #3580

Closed

2 tasks

christophe-murphy pushed a commit that referenced this pull request Aug 30, 2024

Update toolkit driver version for cuda 12.6 (#3586)

d3a6e2a

Note that this will be superseded by #3588

christophe-murphy mentioned this pull request Sep 12, 2024

[BUG] Rejected GPU drivers that should be allowed for CUDA version #3601

Open

2 tasks

christophe-murphy mentioned this pull request Sep 30, 2024

Add support for CUDA Toolkit 12.5 #3570

Closed

5 tasks

christophe-murphy mentioned this pull request Oct 11, 2024

Info user on minimum drivers for CUDA toolkits used by ArrayFire binaries #2480

Closed

FloopCZ reviewed Dec 6, 2024

View reviewed changes

src/backend/cuda/device_manager.cpp Outdated Show resolved Hide resolved

edwinsolisf reviewed Jan 2, 2025

View reviewed changes

christophe-murphy and others added 2 commits January 7, 2025 16:40

Update src/backend/cuda/device_manager.cpp

6a19cce

Update driver versions to minimum required. Co-authored-by: Filip Matzner <FloopCZ@users.noreply.github.com>

Merge branch 'master' into 3580-bug-investigate-test-failures-when-ru…

9cdf02f

…nning-with-cuda-126

edwinsolisf self-requested a review January 9, 2025 00:38

edwinsolisf approved these changes Jan 9, 2025

View reviewed changes

christophe-murphy merged commit f4edcf2 into master Jan 9, 2025
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3580 bug investigate test failures when running with cuda 126#3588

3580 bug investigate test failures when running with cuda 126#3588
christophe-murphy merged 7 commits intomasterfrom
3580-bug-investigate-test-failures-when-running-with-cuda-126

christophe-murphy commented Aug 21, 2024

Uh oh!

FloopCZ left a comment

Uh oh!

Uh oh!

FloopCZ commented Dec 7, 2024

Uh oh!

edwinsolisf left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

christophe-murphy commented Aug 21, 2024

Description

Checklist

Uh oh!

FloopCZ left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FloopCZ commented Dec 7, 2024

Uh oh!

edwinsolisf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants