173 questions
0
votes
0
answers
43
views
Deadlock with Sycl Graph and FFT node?
We use sycl command_graph to process data. The processing is configured during runtime.
One of the (possible) processing steps is a real to complex fft, for this we use Intels oneApi mkl dft.
If a fft ...
0
votes
1
answer
61
views
SYCL avoiding unnecessary memory transfers and Intel oneAPI guide examples
I'd like to clarify which memory transfers are done in two examples of SYCL code in Intel's oneAPI guide. The guide makes one point that contradicts my impression about when memory transactions are ...
0
votes
0
answers
69
views
How to solve the problem of "SYCL exception caught: Native API failed. Native API returns: 68 (UR_RESULT_ERROR_IN_EVENT_LIST_EXEC_STATUS)"?
The error is caught by the for loop below:
try {
for (int t = 0; t < T; t++) {
if (t % 2 == 0) {
// Run on CPU
q_cpu.submit([&](sycl::handler& h) {
...
3
votes
0
answers
167
views
An error occurs when creating queues with gpu selector in SYCL program
This is my SYCL program on windows to check if SYCL works fine in my system
#include <iostream>
#include <cl/sycl.hpp>
#include <Windows.h>
int main()
{
int array[5];
{
...
4
votes
1
answer
169
views
Matrix multiplication with SYCL-sub-groups goes wrong
I'm new to SYCL. The following code compiles fine but gives the wrong result.
The code computes the product of two matrices (A, RM x K and B, RK x N) into a result matrix (C, RM x N).
#include <...
2
votes
1
answer
94
views
Unpacking member type from std::tuple types
I have a collection of struts, each with a data_t member type (other members omitted):
struct A { using data_t = int; };
struct B { using data_t = double; };
struct C { using data_t = bool; };
// etc.
...
0
votes
2
answers
145
views
Concurrent kernels using oneAPI SYCL and Nvidia GPU plugin
I'm trying to run two kernels concurrently on a single Nvidia GPU using oneAPI SYCL and the Nvidia plugin. Is this possible? If not, why? Here is where I'm at so far: I'm able to run two kernels ...
3
votes
2
answers
182
views
SYCL and standard ISO C++
One of the advantages of SYCL (heterogeneous computing by Khronos) is claimed that the source code is standard C++17 source code. I am not sure how do I understand this. Is it strict statement or just ...
-1
votes
1
answer
97
views
Why my same GPU was in different context?
I have two rtx3090 gpus, and i want to bind them in one context.
but below code:
int main() {
std::vector<sycl::device> devices = sycl::device::get_devices();
std::vector<sycl::device&...
1
vote
0
answers
64
views
Can you invoke a number of SYCL kernels based on an integer stored in device memory?
Does SYCL have an analogous feature to Direct3D's, or Unity's ComputeShader.DispatchIndirect, where you can launch x number of kernels where x is an integer stored on GPU memory. This avoids having to ...
1
vote
0
answers
99
views
Querying the maximum number of concurrent kernel launches in SYCL
In my application I perform the same computation on batches of problems, I do however require some intermediate data to be allocated for these computations and therefore I've resorted to function ...
-3
votes
1
answer
146
views
error: redefinition of _Is_memfunptr when trying to compile with icx
I'm new to SYCL. And I'm having some trouble compiling an almost empty program, and I really don't understand how to fix the redefinition of type_traits.
I've tried all the compilation command options ...
2
votes
0
answers
235
views
Is this the right equivalent of `cublasSgemmStridedBatched`?
I'm trying to port some cuda code which uses cuBlas to SYCL, using oneMKL's blas. OneMKL appears to be very slow though.
This is the specific snippet
cublasCheck(cublasSgemmStridedBatched(...
1
vote
1
answer
612
views
How to build a shared or static library with SYCL using DPC++
I am trying to build a shared Linux library that can be distributed and linked, like any normal shared library. We have recently ported our HPC GPU routines from CUDA to SYCL in order to be cross-...
1
vote
1
answer
389
views
Failing to compile and link PcapPlusPlus library with SYCL code
I want to use the C++ library PcapPlusPlus and it‘s header files in my SYCL code. More exactly I want to compile it with the Intel C++ Compiler (icpx). I know how to program and know how C, Java and ...
1
vote
2
answers
214
views
SYCL NDRange and Hierarchical: Why one of them is not enought?
SYCL offers NDRange and Hierarchical kernel parallelism abstractions.
My questions:
Is it true to claim that NDRange better mapped into GPUs hardware and Hierarchical parallelism better mapped into ...
2
votes
0
answers
51
views
Group Algorithms not supported on host device
Hi so I want to run a parallel reduction operation on my host device.
When I compile using clang++ -fsycl it compiles fine but when I run it I get the following:
terminate called after throwing an ...
-1
votes
1
answer
96
views
Follow symbol does not recognize SYCL functions and keywords in Qt creator
I am using Qt creator 4.4.0 to develop a SYCL application. I am not able to follow the SYCL functions into the headers to check for their definitions in Qt like I can do with other standard headers ...
1
vote
1
answer
223
views
GPGPU with Radeon Pro VII in Windows [closed]
I start with the question, in case somebody can say something without going through the whole post:
What is the easiest way to start programming with a Radeon Pro VII in C++ in Windows?
And for anyone ...
0
votes
1
answer
605
views
SYCL: No kernel name was found
There are other similar questions regarding this issue, but their answers do not solve my case:
terminate called after throwing an instance of 'sycl::_V1::runtime_error' what(): No kernel named ...
1
vote
0
answers
354
views
Parallel Reduction with SYCL
Hi im trying to perform a parallel reduction with SYCL, but after every calculation it seems my device fails to copy the values back to my host device. Attached is a snippet of my code:
int ddot (...
4
votes
1
answer
258
views
cuEventCreate evoked when launching kernel with SYCL code
I have recently ported my legacy CUDA code to SYCL using OneAPI for NVIDIA GPUs. The code runs fine but is two times slower than the native CUDA code. After profiling, I found the following thing. ...
0
votes
0
answers
154
views
Code only runs on optimization level 1 (-O1) using ICPX with SYCL application
I am developing a SYCL application using the Intel ICPX compiler. While the code executes successfully in debug mode with the -O0 optimization level, I encounter issues in the release version: the ...
0
votes
4
answers
521
views
segmentation fault error when i use gemm function of DPC++ blas library on NVIDIA-GPU
I installed oneAPI base kit and HPC kit (2024.0) on public cluster to test the performance of gemm.
but I got segmentation fault error. I don't know how to fix this problem.
I used offline installer ...
-1
votes
1
answer
218
views
SYCL q.memcpy() & h.memcpy() & Intel developer cloud problems
I'm using SYCL on Intel Develope Cloud to test Innovative algorithms.
My questions:
SYCL q.memcpy() & h.memcpy() do not work. It seems that Intel
know about it
What is the status of this issue? ...
1
vote
1
answer
490
views
SYCL GPU device query - Is the GPU device is discrete or integtated?
I write a SYCL application for which I need to distinguish between GPU discrete devices and GPU integrated devices. Is there any way, directly or indirectly, to know if the GPU device I selected
(for ...
0
votes
2
answers
158
views
SYCL - Cannot find the origin of memory corruption
I've been writing a ray tracer using SYCL for a few weeks but I'm now facing a memory corruption issue and I really can't find where it's coming from.
I'm working on Windows 11 22H2 using the Intel ...
0
votes
2
answers
508
views
Cmake file for sycl CUDA backend
I am having trouble writing a CMake file to offload SYCL code to the NVIDIA backend. My CMake file currently looks like this
cmake_minimum_required(VERSION 3.22.1)
set(CMAKE_C_COMPILER /opt/intel/...
0
votes
1
answer
214
views
Using points in sycl class
Is it possible to copy a class containing pointers to its internal attribute using SYCL and offload it to the graphics card? Basically, I try to reference members to avoid unnecessary memory usage. I ...
0
votes
1
answer
580
views
Using Classes in SYCL
I am trying to adopt an OOP software design strategy for a SYCL project I was working on.
I got my code running in its C++ version, and then I attempted to convert it to SYCL while trying to make the ...
3
votes
2
answers
151
views
is `const_cast`ing away const on a reference worth it to preserve the api
We have the specific case with the GPU programing paradigm sycl as described in this fix request where we want to use read only access from a buffer. Specifically image the use case like:
namespace ...
0
votes
0
answers
245
views
Illegal instruction on execution
I'm trying to install hipSYCL on Nvidia GPU, hipSYCL is installed but it is giving illegal instruction error when try to run the syclcc compiler.
I tried running syclcc command on CPU and here is its ...
0
votes
2
answers
135
views
How do I Manage a Stateful Data Structure in Local Memory Shared by All Workitems in a OpenCL/SYCL Workgroup
I'm trying to optimize my memory-bound numerical simulation kernel in OpenCL/SYCL using local memory to allow data sharing between workitems, so that I can reduce redundant global memory traffic.
When ...
0
votes
1
answer
224
views
SYCL USM on integrated devices
SYCL USM will work on discrete GPU if-and-only-if the GPU's hardware supports unified virtual address space.
What is case regarding integrated GPUs?
Can we assume that any integrated GPU supports USM?
...
2
votes
1
answer
205
views
Purpose of `use_host_ptr` property in SYCL
What is the point of use_host_ptr property in SYCL?
Why will the SYCL runtime not use the memory pointed to by the provided host pointer?
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-...
0
votes
1
answer
962
views
SYCL - no GPU platform detected in Windows Visual Studio
I want to do workload on Nvidia GPU with SYCL in Windows 10 Pro 21H2 19044.3086. SYCL guide states that CUDA backend in supported on Windows:
Build DPC++ toolchain with support for NVIDIA CUDA
To ...
16
votes
1
answer
2k
views
4000% Performance Decrease in SYCL when using Unified Shared Memory instead of Device Memory
In SYCL, there are three types of memory: host memory, device memory, and Unified Shared Memory (USM). For host and device memory, data exchange requires explicit copying. Meanwhile, data movement ...
0
votes
2
answers
119
views
does valgrind support profiling SYCL applications
I'm trying to identify valgrind's support for different Programing languages, I just want to find the valgrind's support for the SYCL applications, if supports how to profile the SYCL Application, If ...
1
vote
3
answers
178
views
How do I Implement a Custom 4-Dimensional Array Viewer/Wrapper in SYCL 2020 / DPC++?
In conventional C++, it's possible to create a multi-dimensional "viewer" or "wrapper" to a 1D buffer in linear memory by (1) defining a custom ArrayWrapper class, (2) overriding ...
1
vote
0
answers
330
views
cannot copy results from device allocated memory to host SYCL unified shared memory
I'm new to SYCL and trying to run very simple vector addition program using ComputeCpp.
#include <sycl/sycl.hpp>
#include <iostream>
class vector_addition;
class vector_initialization;
...
0
votes
2
answers
358
views
Compile and dynamically link generated c-code in SYCL
I have "*.c" files generated during runtime with function implementation int foo(int, int):
extern "C"{
int foo(int a, int b)
{
return a + b;
}
}
I want to use these ...
0
votes
1
answer
1k
views
sycl get_devices() return just the CPU while I have an integrated Intel Iris xe and a dedicated nvidia GPU
Currently I am working in a projet using DPC++. I have worked for a while in the Intel DevCloud. I haven't any problem using computing ressources. When I select a gpu, it works as expected. However, ...
0
votes
1
answer
480
views
No kernel named was found. First SYCL app
I'm trying to code my first SYCL app. Just some falling sand.
The details aren't important. just if cell has sand and cell beneath is empty move the sand, else bottom left or bottom right or if no ...
0
votes
1
answer
185
views
Cannot allocate more than 256MB in one allocation on my discrete Intel GPU using SYCL/DPC++ on Linux
When trying to allocate more than 256MB in one allocation on a discrete Intel GPU using SYCL/DPC++ on Linux, I get a runtime error and the program exits immediately, despite having significantly more ...
0
votes
3
answers
459
views
SYCL dot product code gives wrong results
In the process of learning SYCL/DPC++, I wrote a SYCL GPU-enabled dot product code (full code on GitHub).
#include <iostream>
#include <sstream>
#include <cmath>
#include <CL/sycl....
1
vote
1
answer
265
views
What is the contiguous dimension in a SYCL kernel? In a buffer? In an image?
What is the contiguous dimension in an N-dimensional SYCL kernel, i.e. the dimension in which threads of a work-group are expected to belong to the same warp/wavefront? I would have expected it to be ...
0
votes
1
answer
352
views
Device-wide synchronization in SYCL on NVIDIA GPUs
Context
I'm porting a complex CUDA application to SYCL which uses multiple cudaStream to launch the kernels. In addition, it also uses the default Stream in some cases, forcing a device-wide ...
0
votes
1
answer
383
views
Mix JIT and AOT compilation for sycl devices
I have a program with a variety of kernels. In production these kernels run on a gpu device and require JIT (Just in time) compilation because we use specialisation constants. For testing we run on ...
2
votes
1
answer
770
views
Is there 1 SYCL implementation to rule all platforms?
Apologies for the slightly jokey title, but I couldn't find another way to concisely describe the question. I work in a team that use predominantly OpenCL code with a CPU fallback. For the most part ...
0
votes
1
answer
273
views
Facing error: SYCL kernel cannot call a recursive function
I was running this code using SYCL and this error was coming up regarding recursion "error: SYCL kernel cannot call a recursive function" I am not sure what is causing this error.
I used ...