Internal Error with TF_GPU_ALLOCATOR=cuda_malloc_async

**System information**
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Pop-OS 20.04
- TensorFlow installed from (source or binary): source
- TensorFlow version (use command below): v2.5.0-rc3-213-ga4dfb8d1a71 2.5.0
- Python version: 3.9.5
- CUDA/cuDNN version: CUDA 11.4 / cuDNN 8.2.2
- GPU model and memory: RTX 3080

**Describe the current behavior**
When using the TF_GPU_ALLOCATOR=cuda_malloc_async, TF throws an internal error after allocation of GPU: 
```
2021-07-08` 12:44:26.553800: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-08 12:44:27.009583: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-08 12:44:27.034925: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.035193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.71GHz coreCount: 68 deviceMemorySize: 9.76GiB deviceMemoryBandwidth: 707.88GiB/s
2021-07-08 12:44:27.035207: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-08 12:44:27.036831: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-08 12:44:27.036855: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-08 12:44:27.037745: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-08 12:44:27.037863: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-08 12:44:27.038095: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-08 12:44:27.038451: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-08 12:44:27.038515: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-08 12:44:27.038573: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.038841: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.039405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-08 12:44:27.039873: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-08 12:44:27.040284: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.040520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.71GHz coreCount: 68 deviceMemorySize: 9.76GiB deviceMemoryBandwidth: 707.88GiB/s
2021-07-08 12:44:27.040556: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.040800: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.041145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-08 12:44:27.041164: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-08 12:44:27.296173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-08 12:44:27.296199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-08 12:44:27.296206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-08 12:44:27.296339: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.296598: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.296835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.297045: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:210] Using CUDA malloc Async allocator for GPU.
2021-07-08 12:44:27.297081: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
Traceback (most recent call last):
  File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/./training.py", line 424, in <module>
    log_writer = tf.summary.create_file_writer(logdir) if log else None
  File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/facecapsnet/lib/python3.9/site-packages/tensorflow/python/ops/summary_ops_v2.py", line 479, in create_file_writer_v2
    with ops.name_scope(name, "create_file_writer") as scope, ops.device("cpu:0"):
  File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/facecapsnet/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 5255, in device
    return context.device(device_name_or_function)
  File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/facecapsnet/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 2072, in device
    ensure_initialized()
  File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/facecapsnet/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1867, in ensure_initialized
    context().ensure_initialized()
  File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/facecapsnet/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 525, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: No allocator statistics
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Internal Error with TF_GPU_ALLOCATOR=cuda_malloc_async #50669

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Internal Error with TF_GPU_ALLOCATOR=cuda_malloc_async #50669

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions