In https://github.com/arrayfire/arrayfire/tree/master/src/backend/opencl/kernel/reduce_blocks_by_key_dim.cl (and similar kernels), arrayfire make us of the following construct:
Tk work_group_scan_inclusive_add(__local Tk *arr) {
__local Tk tmp[DIMX];
__local int *l_val;
My OpenCL compiler complains about it with that error:
input.cl:152:16: error: non-kernel function variable cannot be declared in local address space
It seems that this error is legit from the compiler side according to the specification:
https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/local.html which indicates
Variables allocated in the __local address space inside a kernel function must occur at kernel function scope.
which is not the case here.