-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
Bug Report
Issue description
I have a model which has two nn.Conv2d modules, and I only use the first one of them in 'forward'.
In general, after executing 'loss.backward()', all weights' gradients of the second Conv2d (the unused one) should be 'None'.
Without nn.DataParallel, I got the correct result (conv2.weight.grad is None).
However with nn.DataParallel, the conv2.weight.grad is a zero tensor instead of None. So that if I run optimizer.step() after backward, weight_decay and momentum will be accumulated for the unused parameters, which causes unexpected results. I hope those gradients of unused parameters keep 'None' instead of a zero tensor.
I have a sample fix for this issue temporarily, but that might cause other problems (when real p.grad = zeros).
loss.backward()
for p in model.parameters():
if torch.sum(torch.abs(p)) == 0.0:
p.grad = None
optimizer.step()So why does this problem occur? And how to fix it correctly?
Code example
See https://gist.github.com/xiaoguai0992/db8742c3fa7a5e02be36e64180693752
The output of the code should be:
Testing non-dataparallel.
conv1.weight, p.grad is None = False
conv1.bias, p.grad is None = False
conv2.weight, p.grad is None = True
conv2.bias, p.grad is None = True
Testing dataparallel
module.conv1.weight, p.grad is None = False
module.conv1.bias, p.grad is None = False
module.conv2.weight, p.grad is None = False
module.conv2.bias, p.grad is None = False
Testing repaired version
module.conv1.weight, p.grad is None = False
module.conv1.bias, p.grad is None = False
module.conv2.weight, p.grad is None = True
module.conv2.bias, p.grad is None = TrueSystem Info
PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 3.5
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti
GPU 3: GeForce RTX 2080 Ti
Nvidia driver version: 418.40.04
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.16.2
[pip3] torch==1.1.0
[pip3] torchvision==0.2.2.post3
[conda] Could not collect