torch.nn.DataParallel causes incorrect gradients

# Bug Report
## Issue description

I have a model which has two nn.Conv2d modules, and I only use the first one of them in 'forward'.

In general, after executing 'loss.backward()', all weights' gradients of the second Conv2d (the unused one) should be 'None'.

Without nn.DataParallel, I got the correct result (conv2.weight.grad is None).

However with nn.DataParallel, the conv2.weight.grad is a zero tensor instead of None. So that if I run optimizer.step() after backward, weight_decay and momentum will be accumulated for the unused parameters, which causes unexpected results. I hope those gradients of unused parameters keep 'None' instead of a zero tensor. 

I have a sample fix for this issue temporarily, but that might cause other problems (when real p.grad = zeros).
```python
loss.backward()
for p in model.parameters():
    if torch.sum(torch.abs(p)) == 0.0:
        p.grad = None
optimizer.step()
```

So why does this problem occur? And how to fix it correctly?

## Code example

See https://gist.github.com/xiaoguai0992/db8742c3fa7a5e02be36e64180693752

The output of the code should be:
```bash
Testing non-dataparallel.
conv1.weight, p.grad is None = False
conv1.bias, p.grad is None = False
conv2.weight, p.grad is None = True
conv2.bias, p.grad is None = True
Testing dataparallel
module.conv1.weight, p.grad is None = False
module.conv1.bias, p.grad is None = False
module.conv2.weight, p.grad is None = False
module.conv2.bias, p.grad is None = False
Testing repaired version
module.conv1.weight, p.grad is None = False
module.conv1.bias, p.grad is None = False
module.conv2.weight, p.grad is None = True
module.conv2.bias, p.grad is None = True
```

## System Info
PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.5
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti
GPU 3: GeForce RTX 2080 Ti

Nvidia driver version: 418.40.04
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.16.2
[pip3] torch==1.1.0
[pip3] torchvision==0.2.2.post3
[conda] Could not collect


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

torch.nn.DataParallel causes incorrect gradients #23938

Bug Report

Issue description

Code example

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch.nn.DataParallel causes incorrect gradients #23938

Description

Bug Report

Issue description

Code example

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions