Fix Variable conversion on the way to/from Python #5581

goldsborough · 2018-03-05T23:43:13Z

This PR changes the behavior of how Python variables (torch.tensor) are converted to ATen Tensors/autograd Variables. Previously, we were unwrapping variables into plain tensors on the Python->C++ path, therefore losing autograd history. We would then create new variables from the tensors on the C++ -> Python path. The major drawback of this is that it is currently not possible for users to define tensors from C++ (e.g. in extensions) and have them work with the autograd.

Now, Python variables are dynamically converted (i.e. "upcast") to tensors on the Python->C++ path, and then re-wrapped into variables (i.e. without creating a new variable, just using the Variable constructor).

For this, I had to add a function to our public C++ api that wraps tensors into variables. To avoid confusion, we want to keep users unaware of the concept of a Variable in C++. Therefore we're not using torch::autograd::make_variable which returns a Variable, but instead torch::as_variable() which returns a Tensor and is declared inside torch/torch.h and defined out-of-sight in torch/csrc/torch.cpp (new file). I went for as_variable because it's a bit clearer and also to avoid confusion with make_variable on our side (users never see torch::autograd::make_variable). Happy to hear thoughts on this.

In the JIT we do want to unwrap/rewrap (according to @zdevito), so we do that manually there.

@zdevito @colesbury @ezyang @apaszke

apaszke

I don't think that as_variable is a good solution. If I understand correctly, the problem is that user extensions generally will get Variables, but if they allocate tensors inside, they will not be wrapped. In my opinion the way we should go about this would be to never let people work on raw tensors directly.

Otherwise our extension API will be very error prone, since tensors you get from ATen will generally behave differently than tensors you allocated yourself (I don't think you can safely mix them when calling different ops). Also, what would happen if you forget to wrap them, and not have a debug build of PyTorch (almost no one has)? Looks like a segfault to me.

Ensuring that everything users see are Variables will make it safe, and make the extension API equivalent to running the Python code, but faster.

torch/csrc/utils/pybind.h

goldsborough · 2018-03-06T19:00:44Z

I see your point. At the same time, right now if you create a new at::Tensor inside C++, users might be confused as to why there's no way to make them work with the autograd. I ran into this issue with https://github.com/pytorch/pytorch/blob/master/test/cpp_extensions/extension.cpp#L9 myself, where I had a tensor stored inside a class and then wanted to backwards through it.

I don't think there's a way to restrict users from creating "raw tensors" right now, anyone could allocate a new tensor -- do we want to just say it's unsupported?

I imagine the safest thing to do would be to add a runtime check to see if the variable actually is a variable and if not, use make_variable() or throw an exception (likely the latter is better). But that's also expensive.

I'm not opposed to the idea that all tensors have to be passed in externally (although it does mean C++ extensions would have to be pure/functional and thus the idea of binding classes goes away, e.g. for MatrixMultiplier in the tests), but that still wouldn't be safe as it stands because there's nothing to prevent users from using Tensor.

apaszke · 2018-03-07T13:17:10Z

I think for now it's safer to disallow passing tensors that require grad into the extensions. If they don't, they would get unwrapped in the binding code, and rewrapped again when going back to Python. At least this ensures that it's hard to shoot yourself in the foot, and all at::Tensors are really tensors. In the future we might want to add a unwrap_variables flag, which if set to false will ignore the unwrapping and rewrapping.

fmassa · 2018-03-07T14:01:45Z

Wait, so that means that we can't write new autograd.Function using Cpp extensions?
Also, prior to this PR, was the backward graph broken if we used Cpp extensions in an autograd.Function?

apaszke · 2018-03-07T15:32:28Z

No, it’s not currently possible (both before and after this patch). You can expose forward and backward functions to python and use those to implement a Function using our public API

apaszke · 2018-03-07T15:34:03Z

And no, variables inside forward don’t require grad and their history is irrelevant (forward runs in no grad mode). If you used the extension in backward you should have used once_differentiable or implemented and applied a backward autograd function.

zdevito · 2018-03-07T17:05:44Z

We have use cases where we want to compose operators inside of C++ and have their gradients recorded. So we do need to support requires_grad in C++.

test/cpp_extensions/doubler.h

apaszke · 2018-03-07T18:16:37Z

@zdevito I agree this is a useful mode to have. I'm just saying that we should be careful when designing the API that is to support variables, because one should never mix both modes unless they really know what they're doing. The type system doesn't help in catching those errors at all, and they will lead to memory corruption or segfaults, and that's just going to be an endless source of frustration for the users

torch/csrc/utils/pybind.h

goldsborough · 2018-03-07T23:27:33Z

I've revamped the API

ezyang · 2018-03-08T04:22:06Z

@pytorchbot retest this please

goldsborough · 2018-03-08T04:59:58Z

I really do not understand setup.py AT ALL. On my machine it happily installs torch/csrc/autograd/generated/VariableType.h, but not in CI. It obviously has to do with the fact that it's a generated file, but it's so incredibly non-obvious where and how and when files get generated and when they get copied into what temporary install directory. It's really a pain

goldsborough · 2018-03-08T22:13:29Z

Tests looking green. I think someone (@apaszke, @colesbury, @zdevito) should take another look at it to sign off or raise any more concerns. I believe @zdevito is waiting for this one to land soon

apaszke · 2018-03-08T22:49:32Z

The code looks good, but we're still hardly doing any checks (only in requires_grad + set_requires_grad), so it's trivial to shoot yourself in the foot, and it's incompatible with our previous API in a way that will cause segfaults.

goldsborough · 2018-03-08T22:54:40Z

Ok, just to understand, where should I add more checks? Do you mean in torch internals to make sure the unwrapping works, or for our user-facing API (though it's just those two methods right now)? As for internals, I believe I've found all cases where we were expecting this wrapping/re-wrapping behavior and it seemed to be only in the JIT (which has the special conversion methods to preserve the old behavior)

zdevito · 2018-03-09T06:59:03Z

I think we should merge this pretty much as is, perhaps the only change being to remove the DEBUG guards on that conversion. Even before this change, having users use Tensors/Variables from C++ was subject to segfaults because of the unchecked constructors. This change was never meant to address that. Instead, it was meant to make the conversion from Python<->C++ tensors consistent regardless of whether the static type in C++ is marked Tensor or Variable. This is needed to clean up several places in the JIT where we need to do list copies/static casts just to get pybind to cooperate. I agree that before we mark C++ Tensors non-alpha quality we need to make sure all the conversions are checked, but I don't think that has to be done in the same commit as these changes.

apaszke · 2018-03-09T12:19:13Z

I'm ok with merging this if we at least do the checks that we never return raw tensors to Python (and raise a clear error instead of corrupting memory). Also, we have docs for these functions now, and people are starting to use them, so if we're planning to do more breaking changes like this one, please let's clearly mark the API as experimental in the docs.

torch/csrc/autograd/variable.h

zdevito · 2018-03-09T18:56:28Z

@apaszke I agree, we should add the dynamic check when converting C++ -> Python. @colesbury : yeah, let's make that check cheap using Ed's modifications. Let's do this in 2 steps: we just modify this PR first to use the expensive check (all the current pybind paths are not perf critical), and then do a follow up PR that replaces with the fast check. This PR is making some JIT pr's harder because I keep forgeting to cast at::Tensors to Variable before sending to pybind.

goldsborough · 2018-03-09T19:04:13Z

Zach meant: "Not having this PR is making some JIT pr's harder because I keep forgeting to cast at::Tensors to Variable before sending to pybind.".

goldsborough · 2018-03-09T19:21:50Z

I've enabled variableness checks always and everywhere now. Let's merge this and then add Ed's cheap variable check on top asap.

onnxbot-worker-1 mentioned this pull request Mar 5, 2018

[auto] pytorch-pr-5581 onnxbot/onnx-fb-universe#965

Closed

apaszke suggested changes Mar 6, 2018

View reviewed changes

torch/csrc/utils/pybind.h Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

goldsborough force-pushed the variable-python branch 2 times, most recently from b57b7d0 to a000f53 Compare March 6, 2018 19:48

zdevito reviewed Mar 7, 2018

View reviewed changes

test/cpp_extensions/doubler.h Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

colesbury reviewed Mar 7, 2018

View reviewed changes

torch/csrc/utils/pybind.h Outdated

This comment was marked as off-topic.

Sign in to view

colesbury reviewed Mar 9, 2018

View reviewed changes

torch/csrc/autograd/variable.h Outdated

This comment was marked as off-topic.

Sign in to view

goldsborough added 6 commits March 9, 2018 11:13

Better handling of Variable conversion on the way to Python

1c77a89

Added tests to check backwards behavior in extensions

0f0cae8

Added comment to python_ir.cpp

6b57fd4

Fix ONNX export

aa0fde6

Add torch::CPU, torch::CUDA and torch::getType

d62bc2b

Copy folders after generating code ...

24a1c39

goldsborough added 4 commits March 9, 2018 11:14

Trying to get VariableType.h installed

9b76ac6

Fix how I construct VariableType

ea8cd3b

Check variable type always everywhere

e2b2db7

Add note about experimental API

518cc7f

goldsborough force-pushed the variable-python branch from 54564f0 to 518cc7f Compare March 9, 2018 19:20

goldsborough added 2 commits March 9, 2018 12:58

Add at:: which was missing after bad merge

6b3bd72

at::CPU -> torch::CPU in extensions

e219d18

goldsborough mentioned this pull request Mar 9, 2018

Add efficient isVariable test to ATen (Part 2) #5675

Merged

zdevito merged commit 7391dae into pytorch:master Mar 9, 2018

goldsborough deleted the variable-python branch March 9, 2018 23:12

ailzhang mentioned this pull request Mar 16, 2018

fix detach in place error in DDP #5829

Merged

Fix Variable conversion on the way to/from Python #5581

Fix Variable conversion on the way to/from Python #5581

Uh oh!

Conversation

goldsborough commented Mar 5, 2018

Uh oh!

apaszke left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

goldsborough commented Mar 6, 2018

Uh oh!

apaszke commented Mar 7, 2018

Uh oh!

fmassa commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apaszke commented Mar 7, 2018

Uh oh!

apaszke commented Mar 7, 2018

Uh oh!

zdevito commented Mar 7, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

apaszke commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

goldsborough commented Mar 7, 2018

Uh oh!

ezyang commented Mar 8, 2018

Uh oh!

goldsborough commented Mar 8, 2018

Uh oh!

goldsborough commented Mar 8, 2018

Uh oh!

apaszke commented Mar 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goldsborough commented Mar 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zdevito commented Mar 9, 2018

Uh oh!

apaszke commented Mar 9, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

zdevito commented Mar 9, 2018

Uh oh!

goldsborough commented Mar 9, 2018

Uh oh!

goldsborough commented Mar 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

apaszke left a comment •

edited

Loading

fmassa commented Mar 7, 2018 •

edited

Loading

apaszke commented Mar 7, 2018 •

edited

Loading

apaszke commented Mar 8, 2018 •

edited

Loading

goldsborough commented Mar 8, 2018 •

edited

Loading