-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix Variable conversion on the way to/from Python #5581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that as_variable is a good solution. If I understand correctly, the problem is that user extensions generally will get Variables, but if they allocate tensors inside, they will not be wrapped. In my opinion the way we should go about this would be to never let people work on raw tensors directly.
Otherwise our extension API will be very error prone, since tensors you get from ATen will generally behave differently than tensors you allocated yourself (I don't think you can safely mix them when calling different ops). Also, what would happen if you forget to wrap them, and not have a debug build of PyTorch (almost no one has)? Looks like a segfault to me.
Ensuring that everything users see are Variables will make it safe, and make the extension API equivalent to running the Python code, but faster.
torch/csrc/utils/pybind.h
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
I see your point. At the same time, right now if you create a new I don't think there's a way to restrict users from creating "raw tensors" right now, anyone could allocate a new tensor -- do we want to just say it's unsupported? I imagine the safest thing to do would be to add a runtime check to see if the variable actually is a variable and if not, use I'm not opposed to the idea that all tensors have to be passed in externally (although it does mean C++ extensions would have to be pure/functional and thus the idea of binding classes goes away, e.g. for |
b57b7d0 to
a000f53
Compare
|
I think for now it's safer to disallow passing tensors that require grad into the extensions. If they don't, they would get unwrapped in the binding code, and rewrapped again when going back to Python. At least this ensures that it's hard to shoot yourself in the foot, and all |
|
Wait, so that means that we can't write new |
|
No, it’s not currently possible (both before and after this patch). You can expose forward and backward functions to python and use those to implement a Function using our public API |
|
And no, variables inside forward don’t require grad and their history is irrelevant (forward runs in no grad mode). If you used the extension in backward you should have used once_differentiable or implemented and applied a backward autograd function. |
|
We have use cases where we want to compose operators inside of C++ and have their gradients recorded. So we do need to support requires_grad in C++. |
test/cpp_extensions/doubler.h
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@zdevito I agree this is a useful mode to have. I'm just saying that we should be careful when designing the API that is to support variables, because one should never mix both modes unless they really know what they're doing. The type system doesn't help in catching those errors at all, and they will lead to memory corruption or segfaults, and that's just going to be an endless source of frustration for the users |
torch/csrc/utils/pybind.h
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
I've revamped the API |
|
@pytorchbot retest this please |
|
I really do not understand |
|
Tests looking green. I think someone (@apaszke, @colesbury, @zdevito) should take another look at it to sign off or raise any more concerns. I believe @zdevito is waiting for this one to land soon |
|
The code looks good, but we're still hardly doing any checks (only in |
|
Ok, just to understand, where should I add more checks? Do you mean in torch internals to make sure the unwrapping works, or for our user-facing API (though it's just those two methods right now)? As for internals, I believe I've found all cases where we were expecting this wrapping/re-wrapping behavior and it seemed to be only in the JIT (which has the special conversion methods to preserve the old behavior) |
|
I think we should merge this pretty much as is, perhaps the only change being to remove the DEBUG guards on that conversion. Even before this change, having users use Tensors/Variables from C++ was subject to segfaults because of the unchecked constructors. This change was never meant to address that. Instead, it was meant to make the conversion from Python<->C++ tensors consistent regardless of whether the static type in C++ is marked Tensor or Variable. This is needed to clean up several places in the JIT where we need to do list copies/static casts just to get pybind to cooperate. I agree that before we mark C++ Tensors non-alpha quality we need to make sure all the conversions are checked, but I don't think that has to be done in the same commit as these changes. |
|
I'm ok with merging this if we at least do the checks that we never return raw tensors to Python (and raise a clear error instead of corrupting memory). Also, we have docs for these functions now, and people are starting to use them, so if we're planning to do more breaking changes like this one, please let's clearly mark the API as experimental in the docs. |
torch/csrc/autograd/variable.h
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@apaszke I agree, we should add the dynamic check when converting C++ -> Python. @colesbury : yeah, let's make that check cheap using Ed's modifications. Let's do this in 2 steps: we just modify this PR first to use the expensive check (all the current pybind paths are not perf critical), and then do a follow up PR that replaces with the fast check. This PR is making some JIT pr's harder because I keep forgeting to cast at::Tensors to Variable before sending to pybind. |
|
Zach meant: "Not having this PR is making some JIT pr's harder because I keep forgeting to cast at::Tensors to Variable before sending to pybind.". |
54564f0 to
518cc7f
Compare
|
I've enabled variableness checks always and everywhere now. Let's merge this and then add Ed's cheap variable check on top asap. |
This PR changes the behavior of how Python variables (
torch.tensor) are converted to ATen Tensors/autograd Variables. Previously, we were unwrapping variables into plain tensors on the Python->C++ path, therefore losing autograd history. We would then create new variables from the tensors on the C++ -> Python path. The major drawback of this is that it is currently not possible for users to define tensors from C++ (e.g. in extensions) and have them work with the autograd.Now, Python variables are dynamically converted (i.e. "upcast") to tensors on the Python->C++ path, and then re-wrapped into variables (i.e. without creating a new variable, just using the
Variableconstructor).For this, I had to add a function to our public C++ api that wraps tensors into variables. To avoid confusion, we want to keep users unaware of the concept of a Variable in C++. Therefore we're not using
torch::autograd::make_variablewhich returns aVariable, but insteadtorch::as_variable()which returns aTensorand is declared insidetorch/torch.hand defined out-of-sight intorch/csrc/torch.cpp(new file). I went foras_variablebecause it's a bit clearer and also to avoid confusion withmake_variableon our side (users never seetorch::autograd::make_variable). Happy to hear thoughts on this.In the JIT we do want to unwrap/rewrap (according to @zdevito), so we do that manually there.
@zdevito @colesbury @ezyang @apaszke