-
Notifications
You must be signed in to change notification settings - Fork 26.3k
add qr_backward functionality for wide case #42216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 86994fe (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 13 times. |
|
@lucas7 Thank you for working on this! Looks good overall, thank you for throwing in tests, too. I'll have to look at the maths some more, thank you for the explanation. In the meantime, could you fix the tabs, please? |
|
Thanks for the PR! The diff is not great indeed, maybe the tabs. You can replace them and push again to make the linter happy. Also is it correct that |
Yeah that's what I get for cutting a PR so late at night. Will remove tabs on update.
Yeah sorry I wasn't communicating clearly, what I meant with the comment is that in the case that the input matrix is tall or square, the calculation is the same. I left a comment inline in the PR where I was referring to, apologies for the confusion. |
Oops, wrong alias, I'm rlucas7 but whoever lucas7 is, they've got an awesome avatar :) Is there a way for me to run the linter locally before I push? I hate to waste the resources to test on the CI w/so many builds just for a linter change. I found this one: https://fossies.org/linux/pytorch/CONTRIBUTING.md#pre-commit-tidylinting-hook not sure if that is up to date instruction. |
|
I ran Also, not sure if you prefer me to do a rebase with changes, I usually wait until all comments addressed to rebase and squash and this one was still addressing the changes to get the builds to green. |
|
@rlucas7 sorry about the username. You want git-clang-format or so for C++, not flake8. |
|
Relative to the organisation of the code: I wonder why we could drop the lambda here, it seems to me that we usually just use |
|
All lint is fine now :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here down is the control flow for the wide case, which is the new stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is using the python @ operator for the (Euclidean) inner product. If this is confusing as written I can replace with the equivalent operator for the Tensor class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine
I was also a bit confused on the choice of a pytorch/tools/autograd/templates/Functions.cpp Line 2040 in 26d5850
And one for the non-singular case pytorch/tools/autograd/templates/Functions.cpp Line 2047 in 26d5850
and a similar structure is followed for the other matrix backprop methods pytorch/tools/autograd/templates/Functions.cpp Line 2136 in 26d5850
and for logdet_backwardpytorch/tools/autograd/templates/Functions.cpp Line 2088 in 26d5850
plus those are closer in the Functions.cpp file to where qr_backward sits, which is why I chose the lambda approach ultimately.If function names aren't exported anywhere then the only benefit would be if the function is used in another function in the Functions.cpp. I didn't see any case of that in the Functions.cpp.
I don't think it will make it clearer what is old and what is new. I initially had written the function as a separate function to keep it simpler. IIRC the only changes I made when moving it into the lambda was changing some of the matrix overwrites to reduce memory use and to handle cases for I think the confusion is around the way github is striping the lambda refactor in the diff as 3 patches, obfuscating that this is really a lift and shift into a lambda. For this I also left some inline comments to help clarify what's going on in terms of the changes. Maybe take a look at that and let me know if it's still unclear and you want to pull it out as a separate function? If you both still prefer it as a separate Overall I think it might help for me to enumerate changes that are made to
Also would it help if I shared some notes on the Maths here for the wide case? Using Github's latex injection is pretty clunky but I could write up the algebra for the wide case in latex and share it via email. Some of the other SciPy devs and I have done that before on some random variate generators and found it helpful to understand why things were coded as they were. |
|
Hello, friendly inquiry, anything further needed from my end at this time-any updates? |
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
Sorry for the delay.
The code change looks good to me. I added just small comments for formatting and comments.
But looking at the test, I think the old specification was not correct and not checking the gradients. It would be nice if you could re-activate that for all the settings and make sure the formula works fine!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this comment refer to? Is it just an artefact or older refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
artefact, I agree comment is confusing, I'll remove the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the trick with caps to avoid changing the code is ok.
But to avoid any issue with unwanted aliasing, I would prevent any capture from the lambda by just setting [] (or if you need anything, specify it explicitly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the trick with caps to avoid changing the code is ok.
But to avoid any issue with unwanted aliasing, I would prevent any capture from the lambda by just setting[](or if you need anything, specify it explicitly).
Got it. I'll remove the captures ampersand, thanks for catching that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could extract m and n before this to make it slightly more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good idea, that will make it clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, makes it clearer, thanks for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're missing a sentence that says how to compute grad_y?
Or just remove it as it is mentioned below and move the final grad_A formula 3 lines down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'll remove the grad_Y part from the line and make the comment a bit clearer, and shuffle around grad_A formula as you mention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this does not.
foo = bar does not write into foo inplace. It just frees the old content of foo and put a copy (talking about shared_ptr like object here, so no content copy) of bar into it.
Can you just update the comment?
In general, I would advise against doing inplace changes here as it would most likely prevent double backward from working.
In this case, grad_Y is actually a view of the provided grads that should never be modified inplace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this does not.
foo = bardoes not write intofooinplace. It just frees the old content of foo and put a copy (talking about shared_ptr like object here, so no content copy) of bar into it.
Can you just update the comment?
I will remove the line from the comments.
I think we also want to remove the overwrite, correct?
Thanks for catching that, is this what you're referring to?
pytorch/aten/src/ATen/native/LinearAlgebra.cpp
Lines 679 to 680 in 888ae1b
| Tensor output = has_out ? at::_unsafe_view(at::bmm_out(out, tensor1_expanded, tensor2_expanded), output_shape) | |
| : at::_unsafe_view(tensor1_expanded.bmm(tensor2_expanded), output_shape); |
IIUC, I will add a grad_V to the control flow and removed the overwriting so it will read:
grad_Y = at::matmul(q, grad_V); on LIC 2087
(changing the grad_Y entries above to read grad_V.
In general, I would advise against doing inplace changes here as it would most likely prevent double backward from working.
In this case, grad_Y is actually a view of the provided grads that should never be modified inplace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that change looks good.
is this what you're referring to?
In some sense yes. You can use the version where you provide the output but that does not work with autograd!
No worries @albanD thanks for your time and thorough review.
I wasn't clear on this one after doing some repo spelunking, I left a couples links at the inline comment of what I found, doesn't seem that the 5th tuple is used? The units tests do seem to pass both with and without them though. I'll put together the full changeset to address comments and update the PR over the weekend. |
|
@rlucas7 , just so you know, there is an issue with the current Do you guaranteed that the matrix |
|
@rlucas7, @alban, the way how But yeah, there is an issue, because users will probably just want a simple |
|
@nikitaved what about we add a warning to the doc specifying the current limitation of the backward for qr and add this improvement (that has the same limitations). |
|
@albanD , sure, makes sense. I can actually update the documentation. |
It seems like the tril solver issue and the dgeqp3 request are separate issues. AFAICS the only change here with a RRQR (on forward pass) would be a right multiply by a permutation matrix The use of RRQR would not change the fact that the equation requires @albanD it looks like there are some conflicts on this branch, causing test failures in CI. I make this guess from ctrl-f in this file for error: https://dr.pytorch.org/api/view-log-full?build_id=129136002 seems to indicate merge conflicts, the other windows builds seem to have network connections issues (unrelated I assume). |
|
Hi, @nikitaved already sent a PR for the doc so that's good. Yes, can you rebase on top of master to make CI happy please? |
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Will wait on the updated CI signal before merging.
clean up linter errors remove overwrites and cleanup comments
Codecov Report
@@ Coverage Diff @@
## master #42216 +/- ##
=======================================
Coverage 69.40% 69.40%
=======================================
Files 378 378
Lines 46610 46610
=======================================
+ Hits 32350 32351 +1
+ Misses 14260 14259 -1
Continue to review full report at Codecov.
|
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@rlucas7 Thank you very much for the contribution! |
|
This PR is ***part*** of what you would need to make lstsq autograd work.
There are a couple other things that would also need to be in place before
you could do lstsq backward pass.
…-Lucas Roberts
On Sep 4, 2020, at 6:46 AM, Dvir Ginzburg ***@***.***> wrote:
@rlucas7 Thank you very much for the contribution!
Just to make sure, should this make 'lstsq' differentiable?
I'm using the nightly version to torch ('1.7.0.dev20200902+cu101), and still having the the derivative for 'lstsq' is not implemented` error.
Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Unblocks implementation of #27036. Note that this PR does not fix #{27036}.
Currently QR decomposition only has support for square and tall (a.k.a. skinny) case.
This PR adds functionality for wide A matrix/tensors, includes 3 unit tests for the new case
and restructures the
qr_backwardmethod to use the same Walther method as a helper.cc @albanD @t-vi
I don't have a gpu machine so haven't tested on cuda but everything passes on my local machine in cpu.
The basic idea of the PR is noted in the comments in the
Functions.cppfile but I'll note here too for clarity:let
be a matrix and
then partition
as 
and call that one
the
here from
is the same as the
from
on entire
matrix. Then transform
with the
rotation got from
to get
now
and similarly for the grads of each piece, e.g. if
is
and
and then
and
is the
is calculated very similar to the original Walther formula (exactly the same in the tall and square cases) but is slightly modified here for wide case matrices.
and take QR of
grad_Athennarrow()ofgrad_R.