-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix torch.nn.functional.grid_sample crashes if grid has NaNs
#42703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
💊 CI failures summary and remediationsAs of commit a58b4b2 (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
6a5ea42 to
c9185c6
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Wait. Isn't NaN propagation what we are supposed to do? |
|
|
|
I thought we decided to propagate NaN as much as we could. I suppose it is more important in grid_sample, which is often used as a layer/op in a neural network. Yet this PR stops propagation IIRC. |
|
But if there are |
|
Why can't we? If there is |
c9185c6 to
3f3a793
Compare
|
Propagating NaNs from grid into output would be quite expensive and before this patch would almost certainly result in segfault. |
3f3a793 to
82437bf
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Hmm sure. Maybe worth saying so in comments then? The previous commit says something like "it is important to clamp NaN", which was unintuitive to me. |
ngimel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify in the docs what happens for nan/inf values in the grid? Interestingly original issue was about very large value turning into inf, looks like it should have been handled before by minimum/maximum clipping?
This fix is for CPU only, what happens on CUDA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
In `clip_coordinates` replace `minimum(maximum(in))` composition with `clamp_max(clamp_min(in))` and swap order of `clamp_min` operands to clamp Nans in grid to 0
82437bf to
a58b4b2
Compare
Done
It was turning into
Somehow it's already working like that on CUDA (perhaps GPU automatically turns |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Great, can you add that to docs note too, that large values can be converted to |
|
FYI, the cuda fix is at #35506. Perhaps we can merge the test in this PR with this and remove Lines 10098 to 10139 in 3cf2551
( |
In
clip_coordinatesreplaceminimum(maximum(in))composition withclamp_max(clamp_min(in))Swap order of
clamp_minoperands to clamp NaNs in grid to 0Fixes #42616