-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Open
Labels
module: ciRelated to continuous integrationRelated to continuous integrationmodule: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generaloncall: relengIn support of CI and Release EngineeringIn support of CI and Release Engineering
Description
🐛 Describe the bug
Related to #162531
Test:
test/test_numba_integration.py::TestNumbaIntegration::test_array_adaptor
Call Stack:
test_numba_integration.py::TestNumbaIntegration::test_active_device SKIPPED [0.0004s] (No multigpu) [ 12%]
test_numba_integration.py::TestNumbaIntegration::test_array_adaptor Fatal Python error: Segmentation fault
Current thread 0x00007f6f7a919440 (most recent call first):
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 326 in safe_cuda_api_call
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 505 in __enter__
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 121 in ensure_context
File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 135 in __enter__
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 231 in _require_cuda_context
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda/api.py", line 76 in as_cuda_array
File "/var/lib/jenkins/workspace/test/test_numba_integration.py", line 140 in test_array_adaptor
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3223 in wrapper
File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549 in _callTestMethod
File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591 in run
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3375 in _run_custom
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3405 in run
File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 650 in __call__
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/unittest.py", line 333 in runtest
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 262 in <lambda>
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 341 in from_call
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 222 in call_and_report
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 133 in runtestprotocol
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pytest_rerunfailures.py", line 549 in pytest_runtest_protocol
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 323 in _main
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 269 in wrap_session
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/config/__init__.py", line 166 in main
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1298 in run_tests
File "/var/lib/jenkins/workspace/test/test_numba_integration.py", line 399 in <module>
CUDA Drivers used : 580.65.06, 580.82.07
Numba used: numba==0.55.2, numba==0.60.0
Looks like someone else is having similar issue:
https://forums.developer.nvidia.com/t/cuda-13-0-segmentation-fault-from-line-cuda-jit-tuple-float64-float64-float64-float64-device-true/341958
This does look like possible issue with CUDA driver.
cc @ptrblck @msaroufim @eqy @jerryzh168 @seemethere @malfet @pytorch/pytorch-dev-infra @nWEIdia @tinglvv @Aidyn-A
Versions
2.10.0
Metadata
Metadata
Assignees
Labels
module: ciRelated to continuous integrationRelated to continuous integrationmodule: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generaloncall: relengIn support of CI and Release EngineeringIn support of CI and Release Engineering