Skip to content

Conversation

@goldsborough
Copy link
Contributor

This PR changes the behavior of how Python variables (torch.tensor) are converted to ATen Tensors/autograd Variables. Previously, we were unwrapping variables into plain tensors on the Python->C++ path, therefore losing autograd history. We would then create new variables from the tensors on the C++ -> Python path. The major drawback of this is that it is currently not possible for users to define tensors from C++ (e.g. in extensions) and have them work with the autograd.

Now, Python variables are dynamically converted (i.e. "upcast") to tensors on the Python->C++ path, and then re-wrapped into variables (i.e. without creating a new variable, just using the Variable constructor).

For this, I had to add a function to our public C++ api that wraps tensors into variables. To avoid confusion, we want to keep users unaware of the concept of a Variable in C++. Therefore we're not using torch::autograd::make_variable which returns a Variable, but instead torch::as_variable() which returns a Tensor and is declared inside torch/torch.h and defined out-of-sight in torch/csrc/torch.cpp (new file). I went for as_variable because it's a bit clearer and also to avoid confusion with make_variable on our side (users never see torch::autograd::make_variable). Happy to hear thoughts on this.

In the JIT we do want to unwrap/rewrap (according to @zdevito), so we do that manually there.

@zdevito @colesbury @ezyang @apaszke

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that as_variable is a good solution. If I understand correctly, the problem is that user extensions generally will get Variables, but if they allocate tensors inside, they will not be wrapped. In my opinion the way we should go about this would be to never let people work on raw tensors directly.

Otherwise our extension API will be very error prone, since tensors you get from ATen will generally behave differently than tensors you allocated yourself (I don't think you can safely mix them when calling different ops). Also, what would happen if you forget to wrap them, and not have a debug build of PyTorch (almost no one has)? Looks like a segfault to me.

Ensuring that everything users see are Variables will make it safe, and make the extension API equivalent to running the Python code, but faster.

This comment was marked as off-topic.

This comment was marked as off-topic.

@goldsborough
Copy link
Contributor Author

I see your point. At the same time, right now if you create a new at::Tensor inside C++, users might be confused as to why there's no way to make them work with the autograd. I ran into this issue with https://github.com/pytorch/pytorch/blob/master/test/cpp_extensions/extension.cpp#L9 myself, where I had a tensor stored inside a class and then wanted to backwards through it.

I don't think there's a way to restrict users from creating "raw tensors" right now, anyone could allocate a new tensor -- do we want to just say it's unsupported?

I imagine the safest thing to do would be to add a runtime check to see if the variable actually is a variable and if not, use make_variable() or throw an exception (likely the latter is better). But that's also expensive.

I'm not opposed to the idea that all tensors have to be passed in externally (although it does mean C++ extensions would have to be pure/functional and thus the idea of binding classes goes away, e.g. for MatrixMultiplier in the tests), but that still wouldn't be safe as it stands because there's nothing to prevent users from using Tensor.

@goldsborough goldsborough force-pushed the variable-python branch 2 times, most recently from b57b7d0 to a000f53 Compare March 6, 2018 19:48
@apaszke
Copy link
Contributor

apaszke commented Mar 7, 2018

I think for now it's safer to disallow passing tensors that require grad into the extensions. If they don't, they would get unwrapped in the binding code, and rewrapped again when going back to Python. At least this ensures that it's hard to shoot yourself in the foot, and all at::Tensors are really tensors. In the future we might want to add a unwrap_variables flag, which if set to false will ignore the unwrapping and rewrapping.

@fmassa
Copy link
Member

fmassa commented Mar 7, 2018

Wait, so that means that we can't write new autograd.Function using Cpp extensions?
Also, prior to this PR, was the backward graph broken if we used Cpp extensions in an autograd.Function?

@apaszke
Copy link
Contributor

apaszke commented Mar 7, 2018

No, it’s not currently possible (both before and after this patch). You can expose forward and backward functions to python and use those to implement a Function using our public API

@apaszke
Copy link
Contributor

apaszke commented Mar 7, 2018

And no, variables inside forward don’t require grad and their history is irrelevant (forward runs in no grad mode). If you used the extension in backward you should have used once_differentiable or implemented and applied a backward autograd function.

@zdevito
Copy link
Contributor

zdevito commented Mar 7, 2018

We have use cases where we want to compose operators inside of C++ and have their gradients recorded. So we do need to support requires_grad in C++.

This comment was marked as off-topic.

This comment was marked as off-topic.

@apaszke
Copy link
Contributor

apaszke commented Mar 7, 2018

@zdevito I agree this is a useful mode to have. I'm just saying that we should be careful when designing the API that is to support variables, because one should never mix both modes unless they really know what they're doing. The type system doesn't help in catching those errors at all, and they will lead to memory corruption or segfaults, and that's just going to be an endless source of frustration for the users

This comment was marked as off-topic.

@goldsborough
Copy link
Contributor Author

I've revamped the API

@ezyang
Copy link
Contributor

ezyang commented Mar 8, 2018

@pytorchbot retest this please

@goldsborough
Copy link
Contributor Author

I really do not understand setup.py AT ALL. On my machine it happily installs torch/csrc/autograd/generated/VariableType.h, but not in CI. It obviously has to do with the fact that it's a generated file, but it's so incredibly non-obvious where and how and when files get generated and when they get copied into what temporary install directory. It's really a pain

@goldsborough
Copy link
Contributor Author

Tests looking green. I think someone (@apaszke, @colesbury, @zdevito) should take another look at it to sign off or raise any more concerns. I believe @zdevito is waiting for this one to land soon

@apaszke
Copy link
Contributor

apaszke commented Mar 8, 2018

The code looks good, but we're still hardly doing any checks (only in requires_grad + set_requires_grad), so it's trivial to shoot yourself in the foot, and it's incompatible with our previous API in a way that will cause segfaults.

@goldsborough
Copy link
Contributor Author

goldsborough commented Mar 8, 2018

Ok, just to understand, where should I add more checks? Do you mean in torch internals to make sure the unwrapping works, or for our user-facing API (though it's just those two methods right now)? As for internals, I believe I've found all cases where we were expecting this wrapping/re-wrapping behavior and it seemed to be only in the JIT (which has the special conversion methods to preserve the old behavior)

@zdevito
Copy link
Contributor

zdevito commented Mar 9, 2018

I think we should merge this pretty much as is, perhaps the only change being to remove the DEBUG guards on that conversion. Even before this change, having users use Tensors/Variables from C++ was subject to segfaults because of the unchecked constructors. This change was never meant to address that. Instead, it was meant to make the conversion from Python<->C++ tensors consistent regardless of whether the static type in C++ is marked Tensor or Variable. This is needed to clean up several places in the JIT where we need to do list copies/static casts just to get pybind to cooperate. I agree that before we mark C++ Tensors non-alpha quality we need to make sure all the conversions are checked, but I don't think that has to be done in the same commit as these changes.

@apaszke
Copy link
Contributor

apaszke commented Mar 9, 2018

I'm ok with merging this if we at least do the checks that we never return raw tensors to Python (and raise a clear error instead of corrupting memory). Also, we have docs for these functions now, and people are starting to use them, so if we're planning to do more breaking changes like this one, please let's clearly mark the API as experimental in the docs.

This comment was marked as off-topic.

@zdevito
Copy link
Contributor

zdevito commented Mar 9, 2018

@apaszke I agree, we should add the dynamic check when converting C++ -> Python. @colesbury : yeah, let's make that check cheap using Ed's modifications. Let's do this in 2 steps: we just modify this PR first to use the expensive check (all the current pybind paths are not perf critical), and then do a follow up PR that replaces with the fast check. This PR is making some JIT pr's harder because I keep forgeting to cast at::Tensors to Variable before sending to pybind.

@goldsborough
Copy link
Contributor Author

Zach meant: "Not having this PR is making some JIT pr's harder because I keep forgeting to cast at::Tensors to Variable before sending to pybind.".

@goldsborough
Copy link
Contributor Author

I've enabled variableness checks always and everywhere now. Let's merge this and then add Ed's cheap variable check on top asap.

@zdevito zdevito merged commit 7391dae into pytorch:master Mar 9, 2018
@goldsborough goldsborough deleted the variable-python branch March 9, 2018 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants