Fix model.to(xla_device) #21048

yf225 · 2019-05-28T23:16:35Z

#17072 breaks model.to(xla_device), because moving model to XLA device involves changing its parameters' TensorImpl type, and the current implementation of nn.Module.to() doesn't support changing module parameters' TensorImpl type:

# https://github.com/pytorch/pytorch/blob/6dc445e1a84dc5d093d640de54f038f021d13227/torch/nn/modules/module.py#L192-L208
def _apply(self, fn):
    ...
    for param in self._parameters.values():
        if param is not None:
            # Tensors stored in modules are graph leaves, and we don't
            # want to create copy nodes, so we have to unpack the data.
            param.data = fn(param.data)  # NOTE: this doesn't allow changing `param.data`'s TensorImpl type
            if param._grad is not None:
                param._grad.data = fn(param._grad.data)  # NOTE: this doesn't allow changing `param._grad.data`'s TensorImpl type
   ...

A hypothetical way to fix this is to do the following:

def _apply(self, fn):
    ...
    for key, param in self._parameters.items():
        if param is not None:
            # Tensors stored in modules are graph leaves, and we don't
            # want to create copy nodes, so we have to unpack the data.
            param_applied = fn(param)
            if param._is_same_impl_type(param_applied):
                param.data = param_applied
            else:  # If we have to change TensorImpl type...
                with torch.no_grad():
                    # We use `requires_grad_()` here, to make sure the new `param` still
                    # has the same `requires_grad` value as the old `param`. An alternative is
                    # to not use `with torch.no_grad():`, but that would cause the following operation
                    # to create a `CopyBackwards` gradient function which is not what we wanted.
                    self._parameters[key] = param_applied.requires_grad_(param.requires_grad)
            if param._grad is not None:
                grad_applied = fn(param._grad)
                if param._grad._is_same_impl_type(grad_applied):
                    param._grad.data = grad_applied
                else:  # If we have to change TensorImpl type...
                    self._parameters[key]._grad = grad_applied
   ...

However, the biggest problem of this approach is that it makes the model.to(device) API less predictable: if we are moving model from CPU to CUDA, all previous references to param are preserved because we use param.data = fn(param.data); however, if we are moving model from CPU to XLA device, all previous references to param are broken because we are assigning new tensors to param. In order to preserve model.to(device) API consistency, we will be changing CPU-CUDA model moving code to also break previous references to the model's parameters, and this will happen in two stages, first as a deprecation notice if we detect previous references to those parameters, second as a hard error (in future releases).

cc. @ailzhang

TODO:
Add explanations why the following cases are no longer supported:

Create optimizer after changing device
- Example: https://discuss.pytorch.org/t/effect-of-calling-model-cuda-after-constructing-an-optimizer/15165
- Example: "Parameters of a model after .cuda() will be different objects with those before the call." is wrong. #7844
Move model to cpu before saving state_dict
- Example: Fix neural_style model saving examples#572, explain why that's not a good way to save a model.

…hem in nn.Module _apply()

gchanan · 2019-06-07T19:32:27Z

torch/csrc/autograd/variable.cpp

can't we just make this a native function? Then you wouldn't need to make your own parsing either.

gchanan · 2019-06-07T19:33:13Z

torch/csrc/autograd/variable.h

this comment shoudl refer to is_same_impl_type.

gchanan · 2019-06-07T19:34:33Z

test/test_nn.py

shouldn't this be cuda()?

gchanan · 2019-06-07T19:34:56Z

test/test_nn.py

please comment what this is attempting to test.

gchanan · 2019-06-07T19:37:50Z

torch/nn/modules/module.py

I don't think we should warn yet: there is no alternative they can use yet if they really want to hold on to a reference to another tensor. We need new APIs that we can direct people to first.

Also, is it possible to name the type of module? It might not be obvious because you can move an entire model and a param in some sub module changes.

gchanan · 2019-06-07T19:51:17Z

torch/nn/modules/module.py

can we please break up this PR? This is trying to do a bunch of different things:

Fix moving to XLA

Do proper version tracking of module parameters (sometimes)

Warn about future breaking changes (without first introducing correct APIs).

gchanan · 2019-06-07T19:52:12Z

torch/nn/modules/module.py

why doesn't this follow the other pattern of using no_grad and setting requires_grad at the end?

gchanan · 2019-06-07T20:47:04Z

torch/nn/modules/module.py

+                    with torch.no_grad():
+                        # We use `.requires_grad_()` here to make sure the new `param` still
+                        # has the same `requires_grad` value as the old `param`.
+                        self._parameters[key] = param_applied.requires_grad_(param.requires_grad)


I don't think this is correct, won't the parameter not even be an nn.Parameter anymore?

yf225 force-pushed the fix_xla branch 7 times, most recently from f75249a to 6cc1468 Compare May 29, 2019 01:52

Add tensor._is_same_impl_type() and tensor._set_impl() API, and use t…

a52405d

…hem in nn.Module _apply()

yf225 force-pushed the fix_xla branch from 7b6d038 to a52405d Compare May 29, 2019 02:02

Will Feng added 2 commits May 29, 2019 09:20

refactor

74952cc

nit

f44e572

yf225 force-pushed the fix_xla branch from 970007c to 15f28f6 Compare May 29, 2019 15:53

improve test

a8907a8

yf225 force-pushed the fix_xla branch from 15f28f6 to a8907a8 Compare May 29, 2019 15:55

Will Feng added 9 commits May 29, 2019 12:13

improve test

a3766f9

fix

f4d9fb4

fix test

67a2edb

fix ref count

894cc48

improve test

48277b4

use new API frmo accumulate_grad

9f73477

comments

b375f42

don't use new API from accumulate_grad

c910723

better comment

d8b177f

yf225 changed the title ~~[WIP] Fix XLA test~~ Preserve previous references to model.param when we call model.to(xla_device) May 30, 2019

yf225 changed the title ~~Preserve previous references to model.param when we call model.to(xla_device)~~ Fix model.to(xla_device) May 30, 2019

ezyang self-requested a review May 30, 2019 20:55

yf225 force-pushed the fix_xla branch 18 times, most recently from 81ffc4b to 9f1ac9c Compare June 7, 2019 19:28

gchanan requested changes Jun 7, 2019

View reviewed changes