Skip to content

CUDA 13.0 Segmentation Fault in test/test_numba_integration.py::TestNumbaIntegration::test_array_adaptor #162878

@atalman

Description

@atalman

🐛 Describe the bug

Related to #162531

Test:

test/test_numba_integration.py::TestNumbaIntegration::test_array_adaptor

Call Stack:

test_numba_integration.py::TestNumbaIntegration::test_active_device SKIPPED [0.0004s] (No multigpu) [ 12%]
test_numba_integration.py::TestNumbaIntegration::test_array_adaptor Fatal Python error: Segmentation fault

Current thread 0x00007f6f7a919440 (most recent call first):
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 326 in safe_cuda_api_call
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 505 in __enter__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 121 in ensure_context
  File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 135 in __enter__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 231 in _require_cuda_context
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda/api.py", line 76 in as_cuda_array
  File "/var/lib/jenkins/workspace/test/test_numba_integration.py", line 140 in test_array_adaptor
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3223 in wrapper
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549 in _callTestMethod
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591 in run
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3375 in _run_custom
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3405 in run
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 650 in __call__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/unittest.py", line 333 in runtest
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 341 in from_call
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pytest_rerunfailures.py", line 549 in pytest_runtest_protocol
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 323 in _main
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 269 in wrap_session
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/config/__init__.py", line 166 in main
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1298 in run_tests
  File "/var/lib/jenkins/workspace/test/test_numba_integration.py", line 399 in <module>

CUDA Drivers used : 580.65.06, 580.82.07
Numba used: numba==0.55.2, numba==0.60.0

Looks like someone else is having similar issue:
https://forums.developer.nvidia.com/t/cuda-13-0-segmentation-fault-from-line-cuda-jit-tuple-float64-float64-float64-float64-device-true/341958

This does look like possible issue with CUDA driver.

cc @ptrblck @msaroufim @eqy @jerryzh168 @seemethere @malfet @pytorch/pytorch-dev-infra @nWEIdia @tinglvv @Aidyn-A

Versions

2.10.0

Metadata

Metadata

Assignees

Labels

module: ciRelated to continuous integrationmodule: cudaRelated to torch.cuda, and CUDA support in generaloncall: relengIn support of CI and Release Engineering

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions