Skip to content

Conversation

@willyborn
Copy link
Contributor

Corrects the conversion from a float/double to half.
This affects following functions: af::min, af::max, af::dot, af::mean, af::mean_var, af::topk, af::var for those GPU, where CUDA supports slow-rate FP16 (type __half is supported, although compute is in float)

Description

In the device function float2half_impl the float/double is converted to an unsigned integer on which bit operations are performed.
Finally we get a uint16 containing the bit presentation of the same value in native_half_t format.
During the return, an extra implicit casting was performed, this time from uint16 to the native_half_t format.

GPU supporting fully __half (type & compute) will not convert a float to __half, and will not show this error.
My GPU is a GTX1080 with CC6.1, with low-rate FP16 support.

Reproducible code, CUDA engine.

{
    float harr[] = {1, 2};
    af::array arr(2, harr);
    af::array arr_f16 = arr.as(f16);
    af::array out = min(arr_f16);
	
	af::print("array_f16", arr_f16);
	af::print("min of array_f16", out);
}

Output:

array_f16
[2 1 1 1]
   Offset: 0
   Strides: [1 2 2 2]
    1.0000
    2.0000

min of array_f16
[1 1 1 1]
   Offset: 0
   Strides: [1 1 1 1]
15360.0000

Additional information about the PR answering following questions:

  • Is this a new feature or a bug fix? Bug fix
  • Why these changes are necessary. This corrects following tests: test_var_cuda, test_topk_dua, test_reduce_cuda, test_blas_cuda, test_dot_cuda, test_mean_cuda & test_meanvar_cuda.
  • Potential impact on specific hardware, software or backends: All GPU's having slow_rate FP16 support
  • Can this PR be backported to older versions? Yes
  • Future changes not implemented in this PR. None

Changes to Users

No changes to API.

Checklist

  • Rebased on latest master
  • Code compiles
  • Tests pass, not tested on ONEAPI
  • Functions added to unified API
  • Functions documented

uint64_t bits{0}; // = *reinterpret_cast<uint64*>(&value); //violating
// strict aliasing!
std::memcpy(&bits, &value, sizeof(double));
union {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to cppreference, this type of conversion is Undefined Behavior. I think the more standard compliant way to do it would be using reinterpret_cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All explicit conversions are now updated to reinterpret_cast.

Copy link
Contributor

@edwinsolisf edwinsolisf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works, tested on Windows with an RTX 3070 Ti Mobile and on Ubuntu 22.04 LTS with a Tesla P4

@christophe-murphy christophe-murphy merged commit eef5773 into arrayfire:master Jan 16, 2025
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants