-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Bug
In test/test_nn.py we skip 'backward' for low-precision types (float, half) because the precision is often too low to get reliable results on large embeddings. The same test doesn't fail for dense embeddings. There's a limit to how much precision we can expect with float and half types, but it would be preferable if they were consistent or if the difference was clearer.
To Reproduce
Steps to reproduce the behavior:
in test/test_nn.py run this with test_backward=True and dtype=torch.float.
self._test_EmbeddingBag(False, 'sum', True, test_backward=test_backward, dtype=dtype)
Run a number of times and it will occasionally fail. With the third parameter (sparse) set to False, we don't see failures.
Expected behavior
Limitations on precision are consistent between sparse and dense implementations of Embedding/EmbeddingBag.
Environment
[bvaughan@devgpu005.ash6 ~/repos/pytorch] ./collect_env.sh
bash: ./collect_env.sh: No such file or directory
[bvaughan@devgpu005.ash6 ~/repos/pytorch] python ./collect_env.py
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
OS: CentOS Linux 7 (Core)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
CMake version: version 3.12.2
Python version: 3.7
Is CUDA available: N/A
CUDA runtime version: 9.2.88
GPU models and configuration:
GPU 0: Tesla M40
GPU 1: Tesla M40
GPU 2: Tesla M40
GPU 3: Tesla M40
GPU 4: Tesla M40
GPU 5: Tesla M40
GPU 6: Tesla M40
GPU 7: Tesla M40
Nvidia driver version: 396.69
cuDNN version: /usr/local/cuda-9.2/targets/x86_64-linux/lib/libcudnn.so.7.4.2
Versions of relevant libraries:
[pip3] numpy==1.15.4
[pip3] numpydoc==0.8.0
[pip3] torch==1.1.0a0+3900816
[pip3] torchvision==0.2.1
[conda] magma-cuda92 2.4.0 1 pytorch
[conda] mkl 2019.1 144
[conda] mkl-include 2019.1 144
[conda] mkl-service 1.1.2 py37h90e4bf4_5
[conda] mkl_fft 1.0.4 py37h4414c95_1
[conda] mkl_random 1.0.1 py37h4414c95_1
[conda] mkldnn 0.16.1 0 mingfeima
[conda] torch 1.0.0a0+aaf6e36
[conda] torch 1.1.0a0+0676ba0
[conda] torch 1.0.0a0+c2f1811
[conda] torch 1.0.0a0+e387d94
[conda] torch 1.0.0a0+298b775
[conda] torch 1.0.0a0+8de9564
[conda] torch 1.0.0a0+b15242f
[conda] torch 1.0.0a0+df022f8
[conda] torch 1.0.0a0+9c20546
[conda] torch 1.0.0a0+35a24a9
[conda] torch 1.0.0a0+d4f9dbf
[conda] torch 1.0.0a0+4a4cc13
[conda] torch 1.0.0a0+e03136f
[conda] torch 1.0.0a0+c715fcc
[conda] torch 1.0.0a0+b8da44d
[conda] torch 1.0.0a0+5c51f65
[conda] torch 1.1.0a0+227c4e9
[conda] torch 1.0.0a0+66a0447
[conda] torch 1.0.0a0+fb8745e
[conda] torch 1.0.0a0+a7445ad
[conda] torch 1.0.0a0+6e0c5a8
[conda] torch 1.1.0a0+71bdfe8
[conda] torch 1.1.0a0+3900816
[conda] torch 1.0.0a0+607094c
[conda] torch 1.0.0a0+3ff7071
[conda] torch 1.0.0a0+24c43e2
[conda] torchvision 0.2.1
[bvaughan@devgpu005.ash6 ~/repos/pytorch]
Additional context
encountered while working on:
#19695