Correct the conversion from float/double to half on CUDA #3627

willyborn · 2024-12-08T16:40:22Z

Corrects the conversion from a float/double to half.
This affects following functions: af::min, af::max, af::dot, af::mean, af::mean_var, af::topk, af::var for those GPU, where CUDA supports slow-rate FP16 (type __half is supported, although compute is in float)

Description

In the device function float2half_impl the float/double is converted to an unsigned integer on which bit operations are performed.
Finally we get a uint16 containing the bit presentation of the same value in native_half_t format.
During the return, an extra implicit casting was performed, this time from uint16 to the native_half_t format.

GPU supporting fully __half (type & compute) will not convert a float to __half, and will not show this error.
My GPU is a GTX1080 with CC6.1, with low-rate FP16 support.

Reproducible code, CUDA engine.

{
    float harr[] = {1, 2};
    af::array arr(2, harr);
    af::array arr_f16 = arr.as(f16);
    af::array out = min(arr_f16);
	
	af::print("array_f16", arr_f16);
	af::print("min of array_f16", out);
}

Output:

array_f16
[2 1 1 1]
   Offset: 0
   Strides: [1 2 2 2]
    1.0000
    2.0000

min of array_f16
[1 1 1 1]
   Offset: 0
   Strides: [1 1 1 1]
15360.0000

Additional information about the PR answering following questions:

Is this a new feature or a bug fix? Bug fix
Why these changes are necessary. This corrects following tests: test_var_cuda, test_topk_dua, test_reduce_cuda, test_blas_cuda, test_dot_cuda, test_mean_cuda & test_meanvar_cuda.
Potential impact on specific hardware, software or backends: All GPU's having slow_rate FP16 support
Can this PR be backported to older versions? Yes
Future changes not implemented in this PR. None

Changes to Users

No changes to API.

Checklist

edwinsolisf · 2025-01-07T02:28:53Z

src/backend/common/half.hpp

-    uint64_t bits{0};  // = *reinterpret_cast<uint64*>(&value);		//violating
-                       // strict aliasing!
-    std::memcpy(&bits, &value, sizeof(double));
+    union {


According to cppreference, this type of conversion is Undefined Behavior. I think the more standard compliant way to do it would be using reinterpret_cast.

All explicit conversions are now updated to reinterpret_cast.

edwinsolisf

Works, tested on Windows with an RTX 3070 Ti Mobile and on Ubuntu 22.04 LTS with a Tesla P4

edwinsolisf reviewed Jan 7, 2025

View reviewed changes

Correct the conversion from float/double to half on CUDA

4156985

willyborn force-pushed the uint16ToHalf branch from 1035c3d to 4156985 Compare January 8, 2025 13:16

edwinsolisf self-requested a review January 9, 2025 02:14

edwinsolisf approved these changes Jan 9, 2025

View reviewed changes

christophe-murphy merged commit eef5773 into arrayfire:master Jan 16, 2025
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct the conversion from float/double to half on CUDA #3627

Correct the conversion from float/double to half on CUDA #3627

Uh oh!

willyborn commented Dec 8, 2024

Uh oh!

edwinsolisf Jan 7, 2025

Uh oh!

willyborn Jan 8, 2025

Uh oh!

edwinsolisf left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Correct the conversion from float/double to half on CUDA #3627

Correct the conversion from float/double to half on CUDA #3627

Uh oh!

Conversation

willyborn commented Dec 8, 2024

Description

Changes to Users

Checklist

Uh oh!

edwinsolisf Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

willyborn Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

edwinsolisf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants