-
Notifications
You must be signed in to change notification settings - Fork 552
Open
Labels
Description
In some scenarios, it seems that, after downsizing an array using an in-place call to rows() (ie, overwrite an array with only subset of its current rows), elementwise value assignment silently fails (ie, the element's value does not change and there are no compile or runtime errors).
Description
- Did you build ArrayFire yourself or did you use the official installers: Built myself.
- Which backend is experiencing this issue? CUDA.
- Do you have a workaround? No.
- Can the bug be reproduced reliably on your system? Yes.
- A clear and concise description of what you expected to happen. Expected elementwise value assignment to succeed (eg, in example below,
Drows(0,0)to have value1234). Given that that fails, expected a compile or runtime error. - Run your executable with AF_TRACE=all and AF_PRINT_ERRORS=1 environment variables set:
# AF_TRACE=all AF_PRINT_ERRORS=1 aftest
[platform][1708117940][9998] [ /tmp/arrayfire/src/backend/common/DependencyModule.cpp:102 ] Attempting to load: libforge.so
[platform][1708117940][9998] [ /tmp/arrayfire/src/backend/common/DependencyModule.cpp:107 ] Unable to open forge
[platform][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:497 ] CUDA Driver supports up to CUDA 12.3.0 ArrayFire CUDA Runtime 11.3.0
[platform][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:478 ] CUDA driver version(12.3.0) not part of the CudaToDriverVersion array. Please create an issue or a pull request on the ArrayFire repository to update the CudaToDriverVersion variable with this version of the CUDA runtime.
[platform][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:566 ] Found 1 CUDA devices
[platform][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:588 ] Found device: NVIDIA RTX A3000 Laptop GPU (sm_86) (5.80 GB | ~12187.5 GFLOPs | 32 SMs)
[platform][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:652 ] AF_CUDA_DEFAULT_DEVICE:
[platform][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:670 ] Default device: 0(NVIDIA RTX A3000 Laptop GPU)
[mem][1708117942][9998] [ /tmp/arrayfire/src/backend/common/DefaultMemoryManager.cpp:127 ] memory[0].max_bytes: 4.8 GB
[mem][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/memory.cpp:155 ] nativeAlloc: 1 KB 0x7fe876800000
[jit][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/compile_module.cpp:472 ] {14966320269747309860 : loaded from /root/.arrayfire/KER14966320269747309860_CU_86_AF_39.bin for NVIDIA RTX A3000 Laptop GPU }
[kernel][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/Kernel.hpp:37 ] Launching arrayfire::cuda::range<float>: Blocks: [1, 1, 1] Threads: [32, 8, 1] Shared Memory: 0
Drows
[5 5 1 1]
[mem][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/memory.cpp:155 ] nativeAlloc: 1 KB 0x7fe876800400
[jit][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/compile_module.cpp:472 ] {291210446400920389 : loaded from /root/.arrayfire/KER291210446400920389_CU_86_AF_39.bin for NVIDIA RTX A3000 Laptop GPU }
[kernel][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/Kernel.hpp:37 ] Launching arrayfire::cuda::transpose<float,false,false>: Blocks: [1, 1, 1] Threads: [32, 8, 1] Shared Memory: 0
0.0000 0.0000 0.0000 0.0000 0.0000
1.0000 1.0000 1.0000 1.0000 1.0000
2.0000 2.0000 2.0000 2.0000 2.0000
3.0000 3.0000 3.0000 3.0000 3.0000
4.0000 4.0000 4.0000 4.0000 4.0000
[jit][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/compile_module.cpp:472 ] {8447298384643760287 : loaded from /root/.arrayfire/KER8447298384643760287_CU_86_AF_39.bin for NVIDIA RTX A3000 Laptop GPU }
[kernel][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/Kernel.hpp:37 ] Launching arrayfire::cuda::memCopy<float>: Blocks: [1, 5, 1] Threads: [32, 1, 1] Shared Memory: 0
[jit][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/compile_module.cpp:472 ] {8641389271879371835 : loaded from /root/.arrayfire/KER8641389271879371835_CU_86_AF_39.bin for NVIDIA RTX A3000 Laptop GPU }
[kernel][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/jit.cpp:512 ] Launching : Dims: [1,1,1,1] Blocks: [1 1 1] Threads: [128 1 1] threads: 128
Drows
[4 5 1 1]
[kernel][1708117942][9998] [ /tmp/arrayfire/src/backend/cuda/Kernel.hpp:37 ] Launching arrayfire::cuda::transpose<float,false,false>: Blocks: [1, 1, 1] Threads: [32, 8, 1] Shared Memory: 0
0.0000 0.0000 0.0000 0.0000 0.0000
1.0000 1.0000 1.0000 1.0000 1.0000
2.0000 2.0000 2.0000 2.0000 2.0000
3.0000 3.0000 3.0000 3.0000 3.0000
Reproducible Code and/or Steps
Program/output (note the (0,0) element of the last print):
#include <arrayfire.h>
using namespace af;
int main(int argc, char **argv)
{
array Drows = range(dim4(5,5));
af_print(Drows);
Drows = Drows.rows(0,3);
Drows(0,0) = 1234;
af_print(Drows);
return 0;
}
# aftest
Drows
[5 5 1 1]
0.0000 0.0000 0.0000 0.0000 0.0000
1.0000 1.0000 1.0000 1.0000 1.0000
2.0000 2.0000 2.0000 2.0000 2.0000
3.0000 3.0000 3.0000 3.0000 3.0000
4.0000 4.0000 4.0000 4.0000 4.0000
Drows
[4 5 1 1]
0.0000 0.0000 0.0000 0.0000 0.0000
1.0000 1.0000 1.0000 1.0000 1.0000
2.0000 2.0000 2.0000 2.0000 2.0000
3.0000 3.0000 3.0000 3.0000 3.0000
Interestingly, initializing the array as a copy of an existing array yields the expected behavior:
#include <arrayfire.h>
using namespace af;
int main(int argc, char **argv)
{
array D = range(dim4(5,5));
array Drows = D;
af_print(Drows);
Drows = Drows.rows(0,3);
Drows(0,0) = 1234;
af_print(Drows);
return 0;
}
# aftest
Drows
[5 5 1 1]
0.0000 0.0000 0.0000 0.0000 0.0000
1.0000 1.0000 1.0000 1.0000 1.0000
2.0000 2.0000 2.0000 2.0000 2.0000
3.0000 3.0000 3.0000 3.0000 3.0000
4.0000 4.0000 4.0000 4.0000 4.0000
Drows
[4 5 1 1]
1234.0000 0.0000 0.0000 0.0000 0.0000
1.0000 1.0000 1.0000 1.0000 1.0000
2.0000 2.0000 2.0000 2.0000 2.0000
3.0000 3.0000 3.0000 3.0000 3.0000
EDIT: Additionally, here is the full debugging output for this case that succeeds (I note an additional [mem] line before the final af_print):
# AF_TRACE=all AF_PRINT_ERRORS=1 aftest
[platform][1708119537][10870] [ /tmp/arrayfire/src/backend/common/DependencyModule.cpp:102 ] Attempting to load: libforge.so
[platform][1708119537][10870] [ /tmp/arrayfire/src/backend/common/DependencyModule.cpp:107 ] Unable to open forge
[platform][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:497 ] CUDA Driver supports up to CUDA 12.3.0 ArrayFire CUDA Runtime 11.3.0
[platform][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:478 ] CUDA driver version(12.3.0) not part of the CudaToDriverVersion array. Please create an issue or a pull request on the ArrayFire repository to update the CudaToDriverVersion variable with this version of the CUDA runtime.
[platform][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:566 ] Found 1 CUDA devices
[platform][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:588 ] Found device: NVIDIA RTX A3000 Laptop GPU (sm_86) (5.80 GB | ~12187.5 GFLOPs | 32 SMs)
[platform][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:652 ] AF_CUDA_DEFAULT_DEVICE:
[platform][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/device_manager.cpp:670 ] Default device: 0(NVIDIA RTX A3000 Laptop GPU)
ArrayFire v3.9.0 (CUDA, 64-bit Linux, build b59a1ae53)
Platform: CUDA Runtime 11.3, Driver: 545.23.08
[0] NVIDIA RTX A3000 Laptop GPU, 5938 MB, CUDA Compute 8.6
[mem][1708119537][10870] [ /tmp/arrayfire/src/backend/common/DefaultMemoryManager.cpp:127 ] memory[0].max_bytes: 4.8 GB
[mem][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/memory.cpp:155 ] nativeAlloc: 1 KB 0x7fe1c8800000
[jit][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/compile_module.cpp:472 ] {14966320269747309860 : loaded from /root/.arrayfire/KER14966320269747309860_CU_86_AF_39.bin for NVIDIA RTX A3000 Laptop GPU }
[kernel][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/Kernel.hpp:37 ] Launching arrayfire::cuda::range<float>: Blocks: [1, 1, 1] Threads: [32, 8, 1] Shared Memory: 0
Drows
[5 5 1 1]
[mem][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/memory.cpp:155 ] nativeAlloc: 1 KB 0x7fe1c8800400
[jit][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/compile_module.cpp:472 ] {291210446400920389 : loaded from /root/.arrayfire/KER291210446400920389_CU_86_AF_39.bin for NVIDIA RTX A3000 Laptop GPU }
[kernel][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/Kernel.hpp:37 ] Launching arrayfire::cuda::transpose<float,false,false>: Blocks: [1, 1, 1] Threads: [32, 8, 1] Shared Memory: 0
0.0000 0.0000 0.0000 0.0000 0.0000
1.0000 1.0000 1.0000 1.0000 1.0000
2.0000 2.0000 2.0000 2.0000 2.0000
3.0000 3.0000 3.0000 3.0000 3.0000
4.0000 4.0000 4.0000 4.0000 4.0000
[jit][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/compile_module.cpp:472 ] {8447298384643760287 : loaded from /root/.arrayfire/KER8447298384643760287_CU_86_AF_39.bin for NVIDIA RTX A3000 Laptop GPU }
[kernel][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/Kernel.hpp:37 ] Launching arrayfire::cuda::memCopy<float>: Blocks: [1, 5, 1] Threads: [32, 1, 1] Shared Memory: 0
[jit][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/compile_module.cpp:472 ] {8641389271879371835 : loaded from /root/.arrayfire/KER8641389271879371835_CU_86_AF_39.bin for NVIDIA RTX A3000 Laptop GPU }
[kernel][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/jit.cpp:512 ] Launching : Dims: [1,1,1,1] Blocks: [1 1 1] Threads: [128 1 1] threads: 128
Drows
[4 5 1 1]
[mem][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/memory.cpp:155 ] nativeAlloc: 1 KB 0x7fe1c8800800
[kernel][1708119537][10870] [ /tmp/arrayfire/src/backend/cuda/Kernel.hpp:37 ] Launching arrayfire::cuda::transpose<float,false,false>: Blocks: [1, 1, 1] Threads: [32, 8, 1] Shared Memory: 0
1234.0000 0.0000 0.0000 0.0000 0.0000
1.0000 1.0000 1.0000 1.0000 1.0000
2.0000 2.0000 2.0000 2.0000 2.0000
3.0000 3.0000 3.0000 3.0000 3.0000
System Information
Please provide the following information:
- ArrayFire version: 3.9.0
- Devices installed on the system NVIDIA RTX A3000 Laptop GPU
- Output from the af::info() function if applicable.
ArrayFire v3.9.0 (CUDA, 64-bit Linux, build b59a1ae53)
Platform: CUDA Runtime 11.3, Driver: 545.23.08
[0] NVIDIA RTX A3000 Laptop GPU, 5938 MB, CUDA Compute 8.6
- Output from the following scripts:
Linux output:
# bash afbugreport.sh
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal
name, memory.total [MiB], driver_version
NVIDIA RTX A3000 Laptop GPU, 6144 MiB, 545.23.08
rocm-smi not found.
clinfo not found.
Checklist
- Using the latest available ArrayFire release
- GPU drivers are up to date