-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
Issue description
With PyTorch 1.2.0 I'm seeing a couple of the test_numba_integration tests fail:
test_from_cuda_array_interface_lifetimefails at the first assert:
def test_from_cuda_array_interface_lifetime(self):
"""torch.as_tensor(obj) tensor grabs a reference to obj so that the lifetime of obj exceeds the tensor"""
numba_ary = numba.cuda.to_device(numpy.arange(6))
torch_ary = torch.as_tensor(numba_ary, device="cuda")
self.assertEqual(torch_ary.__cuda_array_interface__, numba_ary.__cuda_array_interface__) # No copy
^^^^^
test_from_cuda_array_interface_active_devicefails at the equivalent assert, but would otherwise fail at a following assert when an expected RuntimeError is not raised:
def test_from_cuda_array_interface_active_device(self):
"""torch.as_tensor() tensor device must match active numba context."""
# Both torch/numba default to device 0 and can interop freely
numba_ary = numba.cuda.to_device(numpy.arange(6))
torch_ary = torch.as_tensor(numba_ary, device="cuda")
self.assertEqual(torch_ary.cpu().data.numpy(), numpy.asarray(numba_ary))
self.assertEqual(torch_ary.__cuda_array_interface__, numba_ary.__cuda_array_interface__)
^^^^^
# Torch should raise `RuntimeError` when the Numba and Torch device differ
numba_ary = numba.cuda.to_device(numpy.arange(6))
with self.assertRaises(RuntimeError):
torch.as_tensor(numba_ary, device=torch.device("cuda", 1))
^^^^^
One of these failures in test_from_cuda_array_interface_active_device is mentioned in #21269, but that PR goes on to fix a different test failure also mentioned there.
These tests were added by commit 5d8879c as part of #20584
The commit comment describes the behavior that the tests are expecting:
Zero-copy
When using
touch.as_tensor(...,device=D)whereDis the same device as the one used in__cuda_array_interface__.Implicit copy
When using
touch.as_tensor(...,device=D)whereDis the CPU or another non-CUDA device.
...Exception
When using
touch.as_tensor(...,device=D)whereDis a CUDA device not used in__cuda_array_interface__.Lifetime
torch.as_tensor(obj)tensor grabs a reference toobjso that the lifetime ofobjexceeds the tensor
Pull Request resolved: #20584
But PyTorch seems to actually always be copying (and copying works even between different GPUs), so all these assertions are unfulfilled.
I'm not sure whether the test or the underlying code is at fault here. The original commit claims that the CUDA device "used in" __cuda_array_interface__ should figure into the behavior, but __cuda_array_interface__ doesn't carry (at least not directly / explicitly) the device refernce. I'm not sure where that distinction would be made.
System Info
- PyTorch or Caffe2: PyTorch 1.2.0
- How you installed PyTorch (conda, pip, source):
conda install pytorch torchvision cudatoolkit=10.0 -c pytorchand built from source - Build command you used (if compiling from source):
>>> pprint.pprint(torch.__config__.show())
('PyTorch built with:\n'
' - GCC 7.2\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - CUDA Runtime 10.1\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75\n'
' - CuDNN 7.6.2\n'
' - Magma 2.5.1\n'
' - Build settings: BLAS=OpenBLAS, BUILD_NAMEDTENSOR=OFF, '
'BUILD_TYPE=Release, CXX_FLAGS=-fvisibility-inlines-hidden -fmessage-length=0 '
'-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong '
'-fno-plt -O2 -pipe '
'-I/opt/anaconda3/conda-bld/pytorch-base_1566336264431/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/include '
'-fdebug-prefix-map=${SRC_DIR}=/usr/local/src/conda/${PKG_NAME}-${PKG_VERSION} '
'-fdebug-prefix-map=${PREFIX}=/usr/local/src/conda-prefix -Wno-deprecated '
'-fvisibility-inlines-hidden -fopenmp -DUSE_QNNPACK -O2 -fPIC -Wno-narrowing '
'-Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits '
'-Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare '
'-Wno-unused-parameter -Wno-unused-variable -Wno-unused-function '
'-Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-error=deprecated-declarations -Wno-stringop-overflow '
'-Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast '
'-fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math '
'-Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=0, '
'USE_OPENMP=1, USE_TRT=1, \n')
- OS: RHEL 7.6, on both x86 and ppc64le
- PyTorch version: 1.2.0
- Python version: Python 3.6.9 :: Anaconda, Inc.
- CUDA/cuDNN version: 10.1 / 7.6.2
- GPU models and configuration: P100 and V100
- GCC version (if compiling from source): 7.2 and 7.3
- CMake version: 3.9.4
- Versions of any other relevant libraries: