-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Make torch.conj() a composite function and return self for real tensors #43270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 33c3375 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 40 times. |
aten/src/ATen/native/UnaryOps.cpp
Outdated
| } | ||
| } | ||
|
|
||
| Tensor& conj_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, conj_stub); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this one?
What happens if you try to use autograd with it? Does it still raise a nice error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. it's only called through at::_conj which has a definition in derivatives.yaml. how else can you use it with autograd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is not called from _conj no.
And the user can call it directly by doing a.conj(out=b).
And this should raise an error if either a or b requires grad.
torch/_torch_docs.py
Outdated
| Computes the element-wise conjugate of the given :attr:`input` tensor. | ||
| Computes the element-wise conjugate of the given :attr:`input` tensor. If :attr:`self` tensor | ||
| has a non-complex dtype, this function returns the :attr:`self` tensor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be input here not self right?
| Tensor conj(const Tensor& self) { | ||
| if (!self.is_complex()) { | ||
| return self; | ||
| } | ||
| return at::_conj(self); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems BC-breaking: previously if a user passed in a real tensor to conj, they'd get back a copy of said tensor. What was the rationale behind this change?
FWIW, in numpy, conj returns a copy of the original tensor if it wasn't complex:
>>> x = np.array([0.1, 0.1])
>>> y = np.conj(x)
>>> y.flags
...
OWNDATA : True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in this issue: #41857 switching to the tensorflow style gradients would involve addition of a bunch of conj in the gradient definitions. In order to preserve the backward performance for real tensors, it seems like a good option to make torch.conj a no op for real tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the decision in that thread was to go for TF-style gradients. However, TF-style gradients require conj-calls everywhere in the gradient formulas. Because of that, we're trying to optimize calls to conj to avoid regressions.
The downside of calling the operator "fast_conj" is that writers of gradient formulas might use "conj" instead of "fast_conj". If we go down that route it would be nice to document somewhere (derivatives.yaml?) that users should use fast_conj and not conj. On the other hand, I think it is probably OK for us to break BC and to diverge from numpy and have our "conj" not do copies when possible but I am not sure which of these two solutions would be better.
Exploring the design space some more, I noticed that there were some comments about having a conj flag on tensors that reinterprets the tensor as its conjugate (#41857 (comment)). Was there a decision on that? That seems like it would eliminate the need to have a fast_conj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why we want to have a function with this behavior, but have a few questions/suggestions:
- I would prefer our
conjhave the same behavior as NumPy's torch.conjis, conceptually, a universal function, and all universal functions create new outputs- for example,
torch.abs(uint8)produces a new tensor, even though it could mathematically fulfill its contract by returning self - analogously, would we expect
torch.ceil(int64)to return self or a copy of self?
- for example,
- do we expect this function to be used by anyone but developers? Is making a non-user-facing function
self_or_conjan option to address the scenarios we'd like to target? If we're uncertain about those scenarios, could we start with this?
I appreciate @zou3519's concern about gradient writers mistakenly using conj instead of self_or_conj, and I agree that's a real possibility, but if we're consistent about using the latter and do our job in code review I think it'll be OK. I'm also less concerned about introducing a fixable performance issue vs. creating an inconsistent and semantically confusing UX for end users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I am leaning towards "faithfully replicate numpy behavior" (@anjali411, I think I mentioned that I was having trouble deciding between these two choices in our call), I want to point out some reasons why fast conj by default might make sense.
do we expect this function to be used by anyone but developers? Is making a non-user-facing function self_or_conj an option to address the scenarios we'd like to target? If we're uncertain about those scenarios, could we start with this?
fast_conj absolutely would be used by non-developers. Any time you write code that you want to generalize over both reals and complex you will write conjugations to handle the complex case, and if they are not no-ops, you will unnecessarily slow down the execution of your code in the real case. Some more support from boedekker here #41857 (comment)
Exploring the design space some more, I noticed that there were some comments about having a conj flag on tensors that reinterprets the tensor as its conjugate (#41857 (comment)). Was there a decision on that?
For the BC consideration here, it doesn't help. If we have a fancy version of conjugate that returns a view rather than new tensor, that would still be incompatible with the Numpy behavior. So if you want to keep numpy behavior, you're still going to have to come up with a new name for view_conj or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @boeddeker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would lean towards @ezyang opinion here personally. In particular, "fast_conj absolutely would be used by non-developers." and not having it would mean that people will start adding a lot of if t.is_complex() in their code.
@mruberry are there functions for which we explicitly plan to have a different API compared to numpy? And maybe provide a numpy-compliant version in a different namespace? Could this function be one of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working a lot with numpy and I must say, I never checked if numpy.conj returns a copy or hold the original data.
Sometimes I got the impression, that it is a no op, also for complex numbers, when I measured the execution time (execution time is close to nothing compared to any other function).
Nevertheless, assuming that conj returns a copy in the source tensor, also for real tensors, is a bad idea.
When you introduce fast_conj, I would never use conj.
In context of torch, I would prefer, when conj never creates a copy.
While the calculation of the copy may be suboptimal, the increased memory consumption is critical.
In numpy you can calculate on demand the conjugate and it will be deleted a few moments later.
In torch, it may be kept for the backward graph. When you call multiple times conj, you store several copies.
So maybe it would also be worth considering, that the conjugate operation gets an argument to enable something like torch.utils.checkpoint.checkpoint or have the conj flag, that was earlier mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @zdevito this may be one of the first cases where we have existentially important laziness on tensors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry are there functions for which we explicitly plan to have a different API compared to numpy? And maybe provide a numpy-compliant version in a different namespace?
Yes, but we should strive to limit these discrepancies. In particular we want a compelling reason for the difference (the NumPy behavior would be inconsistent with PyTorch, or it would require an extremely painful deprecation, ...).
Could this function be one of them?
Sure. I'm actually less worried about the NumPy compatibility of this function than the PyTorch inconsistency of it. It seems like we're trying to combine the ideas of torch.conj() the unary ufunc and a .conj attribute with a corresponding view function, like .real, .imag, and .T.
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
|
Here's some fun that I think shows we shouldn't follow NumPy exactly: Perhaps Same doesn't apply to complex values: |
| def test_conj_self(self, device, dtype): | ||
| t = torch.ones(5, 5, device=device) | ||
| s = t.fast_conj() | ||
| self.assertTrue(s is t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we actually want this or do we want s to be a view of t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could but I think we should just return same tensor as input with same storage to make it a complete no op
|
I guess my questions are:
|
I filed numpy/numpy#17124 about the discrepancy. |
I commented on the NumPy issue Mike opened. For the decision to be made here, I think it's more important that PyTorch has a consistent design here rather than that it follow NumPy exactly. The copy-vs-view behavior in NumPy can be inconsistent for no good reason, and is very hard to change at this point. I'd prefer if functions and methods always had the same behaviour. |
|
Based on this new information (thanks @gchanan for looking, this is really useful information), I've amended my position: I think we should do a serious investigation of doing conjugation view by default; I kind of suspect we still won't do it for Conjugate views work in the following way:
An alternative to conjugate views is peephole lazy tensors, which @zdevito has advocated for. Peephole lazy tensors work like this:
A variation of peephole lazy tensors is rematerialization (#42056). Instead of saving the materialized tensor, we never save it, and instead always redo the conjugation at a use site. (Or perhaps there is some policy, which specifies whether or not we save or not.) |
… real tensors" `torch.conj` is a very commonly used operator for complex tensors, but it's mathematically a no op for real tensors. Switching to tensorflow gradients for complex tensors (as discussed in #41857) would involve adding `torch.conj()` to the backward definitions for a lot of operators. In order to preserve autograd performance for real tensors and maintain numpy compatibility for `torch.conj`, this PR updates `torch.conj()` which behaves the same for complex tensors but performs a view/returns `self` tensor for tensors of non-complex dtypes. The documentation states that the returned tensor for a real input shouldn't be mutated. We could perhaps return an immutable tensor for this case in future when that functionality is available (@zdevito @ezyang ). [ghstack-poisoned]
torch/_torch_docs.py
Outdated
| Computes the element-wise conjugate of the given :attr:`input` tensor. | ||
| Computes the element-wise conjugate of the given :attr:`input` tensor. If the input has a non-complex dtype, | ||
| the output tensor shouldn't be mutated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per @ezyang 's suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since we're exploring conjugate view, let me suggest we make an even stronger claim: we should tell users to flat out NOT mutate the output of this function, ever. If you avoid doing so, you will write code that is forwards compatible with any possible semantics of conj (it always returns a new copy, it returns a new copy sometimes, it always returns a view).
I despair about actually making sure people don't do this with just a doc update. We really need some way to mark views as non-mutable. Maybe this wouldn't even be that hard to do...
… real tensors" `torch.conj` is a very commonly used operator for complex tensors, but it's mathematically a no op for real tensors. Switching to tensorflow gradients for complex tensors (as discussed in #41857) would involve adding `torch.conj()` to the backward definitions for a lot of operators. In order to preserve autograd performance for real tensors and maintain numpy compatibility for `torch.conj`, this PR updates `torch.conj()` which behaves the same for complex tensors but performs a view/returns `self` tensor for tensors of non-complex dtypes. The documentation states that the returned tensor for a real input shouldn't be mutated. We could perhaps return an immutable tensor for this case in future when that functionality is available (@zdevito @ezyang ). [ghstack-poisoned]
… real tensors" `torch.conj` is a very commonly used operator for complex tensors, but it's mathematically a no op for real tensors. Switching to tensorflow gradients for complex tensors (as discussed in #41857) would involve adding `torch.conj()` to the backward definitions for a lot of operators. In order to preserve autograd performance for real tensors and maintain numpy compatibility for `torch.conj`, this PR updates `torch.conj()` which behaves the same for complex tensors but performs a view/returns `self` tensor for tensors of non-complex dtypes. The documentation states that the returned tensor for a real input shouldn't be mutated. We could perhaps return an immutable tensor for this case in future when that functionality is available (@zdevito @ezyang ). [ghstack-poisoned]
|
|
||
| - func: _conj(Tensor self) -> Tensor | ||
| use_c10_dispatcher: full | ||
| variants: function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this doesn't get traced; that would be _convolution all over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to rename it to conj_complex() as it is just a version of conj() that only accepts complex inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alias information for conj above is now not correct; you need to give it an alias annotation similar to what contiguous has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anjali411 Basically, _convolution, which has a bunch of extra arguments that people typically don't care about, is what is being traced to serialized models, meaning that now it is very difficult to add new parameters to _convolution because they are FC-breaking for all models.
conj is considerably less trafficked so the risks of getting it wrong here are lower, but it still seems to me that we should prefer tracing conj and not _conj. You should test which one we're getting in the trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Luckily I think the intent here is to keep _conj's signature equivalent to conj, since it's just an autograd hack to only apply autograd to complex inputs to conj. As you point out, @ezyang, _convolution suffers because it's secretly pulling in arguments the user doesn't expect.
Interesting question for what gets traced when ops are composite though: the composite ops or the original symbol. I don't know offhand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the code that handle tracing and autograd was the same until very recently. I expect that only the inner function is traced.
test/test_torch.py
Outdated
| @dtypes(*(torch.testing.get_all_int_dtypes() + torch.testing.get_all_fp_dtypes())) | ||
| def test_conj_self(self, device, dtype): | ||
| t = torch.ones(5, 5, device=device) | ||
| s = t.fast_conj() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be updated
|
In the interest of moving things along, I'd prefer to accept this patch. However, there are a bunch of ways this could go wrong. I'd prefer it if someone else could also ACK. I'll also take a look at how hard it would be to mark tensors as immutable. |
| use_c10_dispatcher: full | ||
| variants: function, method | ||
|
|
||
| - func: conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future work: inplace conj_?
torch/_torch_docs.py
Outdated
| conj(input, *, out=None) -> Tensor | ||
| Computes the element-wise conjugate of the given :attr:`input` tensor. | ||
| Computes the element-wise conjugate of the given :attr:`input` tensor. If the input has a non-complex dtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would: "Computes the element-wise complex conjugate of :attr`input`." be a clearer fist sentence? "Conjugate" seems overloaded.
I wonder if the second sentence shouldn't be a note and/or a warning? Maybe something like:
NOTE: if :attr`input` is a non-complex dtype this function just returns :attr:input.
WARNING: In the future torch.conj may return a non-writeable view of :attr:input. It's recommended that programs not modify the tensor returned by torch.conj to be compatible with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the behavior for non-complex tensors should go in the description, but added a warning as per your suggestion.
mruberry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach makes sense. Although this function may change in the future to return an immutable view the complex tensor UX is still in "beta", and @ezyang's suggestion to clarify the documentation seems like a reasonable mitigation. Few cleanup requirements:
- test needs update (per @ezyang's note)
- doc needs clarification
As for NumPy Compatibility, sometimes we do need to be different than NumPy for reasons of performance, our focus on neural networks, or our support of autograd, and I think we have a consensus that this approach is the best way (for now) to implement performant complex autograd.
From the perspective of how PyTorch's functions work, it's important that users know what to expect when they call a function. That is, ideally they should be able to infer what the function will do without reading its documentation. This means that functions should be named reasonably (e.g. torch.linalg.outer computes an outer product, and torch.ger is indecipherable for most users), and the system should be so consistent that users don't need to memorize a lot of exceptions or strange rules (for example, functions and methods in PyTorch perform the same computations).
torch.conj is a weird duck because there are good arguments for it being a unary ufunc like it is in NumPy, a view function, or a function like contiguous, cpu, cuda, and to. The first class of functions has always created a new tensor. The second usually returns a view (ahem, reshape), and the third class, which is hard to name, has returned self or a new tensor. From our discussion, however, it seems like torch.conj will eventually be in one of the latter two groups. Maybe the third class should be extended (and maybe torch.abs(uint8) should return self) or maybe torch.conj will eventually return an (immutable) view. Either way, this current approach seems like our best approximation of that future behavior.
|
Thanks Mike! |
… real tensors" `torch.conj` is a very commonly used operator for complex tensors, but it's mathematically a no op for real tensors. Switching to tensorflow gradients for complex tensors (as discussed in #41857) would involve adding `torch.conj()` to the backward definitions for a lot of operators. In order to preserve autograd performance for real tensors and maintain numpy compatibility for `torch.conj`, this PR updates `torch.conj()` which behaves the same for complex tensors but performs a view/returns `self` tensor for tensors of non-complex dtypes. The documentation states that the returned tensor for a real input shouldn't be mutated. We could perhaps return an immutable tensor for this case in future when that functionality is available (@zdevito @ezyang ). [ghstack-poisoned]
Codecov Report
@@ Coverage Diff @@
## gh/anjali411/53/base #43270 +/- ##
=======================================================
Coverage ? 69.32%
=======================================================
Files ? 379
Lines ? 47106
Branches ? 0
=======================================================
Hits ? 32654
Misses ? 14452
Partials ? 0 Continue to review full report at Codecov.
|
|
@anjali411 merged this pull request in 129f406. |
torch.conjis a very commonly used operator for complex tensors, but it's mathematically a no op for real tensors. Switching to tensorflow gradients for complex tensors (as discussed in #41857) would involve addingtorch.conj()to the backward definitions for a lot of operators. In order to preserve autograd performance for real tensors and maintain numpy compatibility fortorch.conj, this PR updatestorch.conj()which behaves the same for complex tensors but performs a view/returnsselftensor for tensors of non-complex dtypes. The documentation states that the returned tensor for a real input shouldn't be mutated. We could perhaps return an immutable tensor for this case in future when that functionality is available (@zdevito @ezyang ).Stack from ghstack:
Differential Revision: D23460493