-
Notifications
You must be signed in to change notification settings - Fork 75.2k
Closed
Labels
TF 2.5Issues related to TF 2.5Issues related to TF 2.5comp:gpuGPU related issuesGPU related issuesstat:awaiting tensorflowerStatus - Awaiting response from tensorflowerStatus - Awaiting response from tensorflowertype:bugBugBug
Description
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Pop-OS 20.04
- TensorFlow installed from (source or binary): source
- TensorFlow version (use command below): v2.5.0-rc3-213-ga4dfb8d1a71 2.5.0
- Python version: 3.9.5
- CUDA/cuDNN version: CUDA 11.4 / cuDNN 8.2.2
- GPU model and memory: RTX 3080
Describe the current behavior
When using the TF_GPU_ALLOCATOR=cuda_malloc_async, TF throws an internal error after allocation of GPU:
2021-07-08` 12:44:26.553800: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-08 12:44:27.009583: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-08 12:44:27.034925: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.035193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.71GHz coreCount: 68 deviceMemorySize: 9.76GiB deviceMemoryBandwidth: 707.88GiB/s
2021-07-08 12:44:27.035207: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-08 12:44:27.036831: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-08 12:44:27.036855: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-08 12:44:27.037745: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-08 12:44:27.037863: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-08 12:44:27.038095: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-08 12:44:27.038451: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-08 12:44:27.038515: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-08 12:44:27.038573: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.038841: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.039405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-08 12:44:27.039873: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-08 12:44:27.040284: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.040520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.71GHz coreCount: 68 deviceMemorySize: 9.76GiB deviceMemoryBandwidth: 707.88GiB/s
2021-07-08 12:44:27.040556: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.040800: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.041145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-08 12:44:27.041164: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-08 12:44:27.296173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-08 12:44:27.296199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-07-08 12:44:27.296206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-07-08 12:44:27.296339: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.296598: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.296835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-08 12:44:27.297045: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:210] Using CUDA malloc Async allocator for GPU.
2021-07-08 12:44:27.297081: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
Traceback (most recent call last):
File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/./training.py", line 424, in <module>
log_writer = tf.summary.create_file_writer(logdir) if log else None
File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/facecapsnet/lib/python3.9/site-packages/tensorflow/python/ops/summary_ops_v2.py", line 479, in create_file_writer_v2
with ops.name_scope(name, "create_file_writer") as scope, ops.device("cpu:0"):
File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/facecapsnet/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 5255, in device
return context.device(device_name_or_function)
File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/facecapsnet/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 2072, in device
ensure_initialized()
File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/facecapsnet/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1867, in ensure_initialized
context().ensure_initialized()
File "/home/sebltm/OneDrive/KCL/Individual_Project/FaceCapsNet/facecapsnet/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 525, in ensure_initialized
context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: No allocator statistics
Metadata
Metadata
Labels
TF 2.5Issues related to TF 2.5Issues related to TF 2.5comp:gpuGPU related issuesGPU related issuesstat:awaiting tensorflowerStatus - Awaiting response from tensorflowerStatus - Awaiting response from tensorflowertype:bugBugBug