2

Hi so I want to run a parallel reduction operation on my host device.

When I compile using clang++ -fsycl it compiles fine but when I run it I get the following:

terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  Group algorithms are not supported on host device. -33 (CL_INVALID_DEVICE) 

This is my parallel reduction operation, which works fine on my GPU:

#include "ddot.hpp"
#include <CL/sycl.hpp>

using namespace sycl;
int ddot (sycl::queue& q, const int n, const double * const x, const double * const y, 
      double * const result, double & time_allreduce)
{
  
    // Buffer with just 1 element to get the reduction results
    double sumResult = 0.0;
    buffer<double> sumBuf { &sumResult, 1 };

    sycl::buffer X(x, sycl::range<1> (n));
    sycl::buffer Y(y, sycl::range<1> (n)); 

    q.submit([&](handler& cgh) {
      sycl::accessor xAcc{X, cgh};
      sycl::accessor yAcc{Y, cgh};

      auto sumReduction = reduction(sumBuf, cgh, plus<>());

      
      unsigned long size = static_cast<unsigned long>(n);

      cgh.parallel_for(range<1>(size), sumReduction,
                      [=](id<1> idx, auto& sum) {
                          sum +=  xAcc[idx] * yAcc[idx];
                      });
    });
    q.wait();

    *result = sumBuf.get_host_access()[0];
    return(0);
}
3
  • The SYCL-2020 spec doesn't require that the "host device" is implemented, which is perhaps why SYCL-2020 reductions (that likely use group algorithms in their implementation) do not work. Consider using OpenCL for CPU on the host instead (this is different to the "host device"). Commented Mar 4, 2024 at 9:46
  • Okay, thank you! Can you please share with me where it says this? Commented Mar 4, 2024 at 14:15
  • Its in the section of the spec that describes the differences between 1.2.1 and 2020: registry.khronos.org/SYCL/specs/sycl-2020/html/… . I found it easiest to search for "host device" in the spec since there are very few instances of it. In the 1.2.1 spec (page 273) it states that host device compiles to native code, may not obtain the same performance as the OpenCL device, but is more easily debuggable: registry.khronos.org/SYCL/specs/sycl-1.2.1.pdf Commented Mar 4, 2024 at 14:21

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.