Skip to content

Conversation

@christophe-murphy
Copy link
Contributor

Modifications and bug fixes to get the OneAPI backend working with Intel OneAPI version 2024.1 This has only been tested on Linux and not Windows.

Description

Several changes needed to be made to the CMake files to support OneAPI version 2024. Calls to the Sycl queue also needed to be modified to prevent nested submissions which are now unsupported.

A number of tests were failing and bugs were identified and fixed including the following:

  • In the wrap function the 4th dimension was not being wrapped
  • The reduce function was being called with an incorrect number of threads in certain circumstances
  • Half precision support check was erroneously indicating support for some devices

A number of functions are not supported on the OneAPI backend, a build option was added to skip tests for unsupported functions rather than failing.

Fixes: #3539

Changes to Users

Additional CMake build option: AF_SKIP_UNSUPPORTED_TESTS
This will cause tests unsupported by the backend to be skipped rather than fail

Checklist

  • [ x] Rebased on latest master
  • [ x] Code compiles
  • [ x] Tests pass
  • [ x] Functions added to unified API
  • [ x] Functions documented

…'t support compiling on Windows, this will be added later.
…o the SYCL queue which is not supported. This has been fixed by moving calls to get() out of the submit calls. If a get call is made to a node that has not yet been evaluated, it will need to submit work to the SYCL queue.
…current backend. If a function is not supported the test is skipped. This works with both the C API which returns an error flag and the C++ API which throws an exception.
… precision as well as the fp16 aspect for the oneAPI backend. Some devices advertise the fp16 aspect but their native vector width for half precision is zero which results in errors when calling OpenCL routines with half precision arguments.
…re made to ireduceDimKernelSMEM for 1, 2 and 4 y threads
…ailing when AF_SKIP_UNSUPPORTED_TESTS CMake option is enabled
@christophe-murphy christophe-murphy linked an issue Jul 16, 2024 that may be closed by this pull request
///
/// \param[in] CALL This is the arrayfire C++ function
#ifdef SKIP_UNSUPPORTED_TESTS
#define FUNCTION_UNSUPPORTED \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this approach to unsupported exceptions: Can you try using http://google.github.io/googletest/advanced.html#asserting-on-subroutines-with-an-exception

This should allow you to add a listener to catch these exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation you linked to seems to be for turning a failed assert into an exception, not for listening for an exception but I will investigate more to see if we can do what you suggest.

static af_err CHECK_SUPPORTED_ERR_RETURN;
#define CHECK_SUPPORTED(af_stat) \
if (af_stat == AF_ERR_NOT_SUPPORTED) \
GTEST_SKIP() << "Function not supported on this backend"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to print the backend name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked into this and an error message with the back end name is generated in backend/common/err_common.cpp in the set_global_error_string function. However it is only printed if the AF_PRINT_ERRORS environment variable is set. Not sure the reason for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also notice that in the ***_NOT_SUPPORTED macros a string is being passed called 'message' however this string is then assigned to the 'backend' member of the SupportError class. So when getBackendName() is called the 'message' is being returned instead of the back end name.

Copy link
Contributor Author

@christophe-murphy christophe-murphy Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is difficult to obtain the back end name at this point in the code, it seems that it is not exposed in the API. If a C++ exception is triggered it will print out the back end as part of the verbose error message when the AF_PRINT_ERRORS environment variable is set. For the C function you just get a AF_ERR_NOT_SUPPORTED returned from the function.

test/medfilt.cpp Outdated
// 4.0000 8.0000 12.0000 16.0000
array b = medfilt(a, 3, 3, AF_PAD_ZERO);
array b;
try { b = medfilt(a, 3, 3, AF_PAD_ZERO); } catch FUNCTION_UNSUPPORTED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are examples in documentations. You should never put test related stuff in these types of code snippets.

blocks[0] *= threads[0];

sycl::nd_range<3> marange(blocks, threads);
sycl::buffer<uint> *idxArrs_get[4];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this go out of scope when you call submit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is captured by the lambda function which is passed as an argument to submit.

if (second.isReady()) {
if (1LL + jdim >= second.ndims() && second.isLinear()) {
// second & out are linear
auto second_get = second.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need better names here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to "second_array", etc..

set(CUDA_LIMIT_GPU_ARCHITECTURE "9.0")
endif()

if(CUDA_VERSION VERSION_GREATER_EQUAL "11.8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was any architecture removed during these releases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for CC < 5.0 was removed in CUDA 12. I've updated the CUDA_ALL_GPU_ARCHITECTURES list.

…eptions. If the not supported exception is thrown, the test can be skipped if the AF_SKIP_UNSUPPORTED_TESTS variable is on.
… SKIP_BACKEND macro will be made instead that will need to be explicitly added to each test that calls a function unsupported for a given backend.
…e not supported by the oneAPI back end. If AF_SKIP_UNSUPPORTED_TESTS is set to ON then all tests with this macro will be skipped. These will need to be removed as oneAPI support is added for each feature.
…It is now used for the UNSUPPORTED_BACKEND macro.
@christophe-murphy christophe-murphy merged commit b1e85d3 into master Feb 13, 2025
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Build] OneAPI version 2024 incompatible for MKL

5 participants