-
Notifications
You must be signed in to change notification settings - Fork 26.3k
adding a beta parameter to the smooth_l1 loss fn #44433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead ghstack-source-id: 90a3b08 Pull Request resolved: #44433
💊 CI failures summary and remediationsAs of commit fd6d110 (more details on the Dr. CI page):
XLA failureJob pytorch_xla_linux_bionic_py3_6_clang9_build is failing. Please create an issue with title prefixed by This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 176 times. |
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead updated the smooth_l1_loss signature in the torch api, added a torch test ghstack-source-id: ad6153e Pull Request resolved: #44433
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead updated the smooth_l1_loss signature in the torch api, added a torch test added a cpp_api_parity test to test the actual kernel calls ghstack-source-id: 9686540 Pull Request resolved: #44433
| return norm_val * grad_output; | ||
| else | ||
| return norm_val * x * grad_output; | ||
| return norm_val * x * grad_output / beta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the smooth l1 loss when |X| < 1 now divides by the beta parameter, so the derivative does as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong. If it divides by beta in the forward pass, should we multiply the grad by beta in the backward pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted with @bdhirsh offline. I was wrong. Brian's current implementation is correct.
|
Making beta a double rather than a Scalar, since it's used in comparison- something that doesn't really make sense for complex types. See comment here (sorry for the duplicate PR) #44197 (review) |
| namespace native { | ||
|
|
||
| using pointwise_fn = void (*)(TensorIterator&, Scalar scalar); | ||
| using pointwise_fn_beta = void (*)(TensorIterator&, Scalar scalar, double beta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lmk if you think the name should be more general, or if I just shouldn't use an alias here. Right now smooth_l1_backwards looks like the only point wise op here that takes in two scalar params
| inline Tensor _smooth_l1_loss(const Tensor& input, const Tensor& target, double beta = 1.) { | ||
| auto t = torch::abs(input - target); | ||
| return torch::where(t < 1, 0.5 * torch::pow(t, 2), t - 0.5); | ||
| return torch::where(t < beta, 0.5 * torch::pow(t, 2) / beta, t - 0.5 * beta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not 100% clear to me if I need to change this, since this seems independent of the kernel code
|
|
||
| Tensor smooth_l1_loss_double_backward(const Tensor & grad, const Tensor & input, const Tensor & target, int64_t reduction) { | ||
| Tensor smooth_l1_loss_double_backward(const Tensor & grad, const Tensor & input, const Tensor & target, int64_t reduction, double beta) { | ||
| // TODO: figure out where to use beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not 100% on how beta should be used in the double_backwards of smooth_l1_loss
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
We might also need to change the SmoothL1Loss class to use this new function API, and update its doc string as well. This can come in a followup PR.
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead updated the smooth_l1_loss signature in the torch api, added a torch test added a cpp_api_parity test to test the actual kernel calls Updating the python API + docs ghstack-source-id: 06cc809 Pull Request resolved: #44433
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Differential Revision: [D23636720](https://our.internmc.facebook.com/intern/diff/D23636720) [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead updated the smooth_l1_loss signature in the torch api, added a torch test added a cpp_api_parity test to test the actual kernel calls Updating the python API + docs some test fixes ghstack-source-id: 48e59a6 Pull Request resolved: #44433
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Differential Revision: [D23636720](https://our.internmc.facebook.com/intern/diff/D23636720) [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead updated the smooth_l1_loss signature in the torch api, added a torch test added a cpp_api_parity test to test the actual kernel calls Updating the python API + docs some test fixes fix linter errors ghstack-source-id: 0e2c279 Pull Request resolved: #44433
Codecov Report
@@ Coverage Diff @@
## gh/bdhirsh/8/base #44433 +/- ##
=====================================================
- Coverage 68.04% 67.98% -0.07%
=====================================================
Files 384 384
Lines 49814 49596 -218
=====================================================
- Hits 33897 33718 -179
+ Misses 15917 15878 -39
Continue to review full report at Codecov.
|
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Differential Revision: [D23636720](https://our.internmc.facebook.com/intern/diff/D23636720) [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead updated the smooth_l1_loss signature in the torch api, added a torch test added a cpp_api_parity test to test the actual kernel calls Updating the python API + docs some test fixes fix linter errors fixing double backwards fn fixing smooth_l1_loss_out to update memory in place correctly removing TODOs kernel fix- casting beta to the same scalar type as the tensor to prevent unnecessary type conversions (e.g. when we have a tensor of floats) fixing test ghstack-source-id: c05ede8 Pull Request resolved: #44433
Thanks for the heads-up, I will make the corresponding change on the XLA side. |
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the concern of division by 0 is a real, let's add a test to check where |x - y| == 0
| Tensor smooth_l1_loss_double_backward(const Tensor & grad, const Tensor & input, const Tensor & target, int64_t reduction, double beta) { | ||
| auto d = (input - target).abs(); | ||
| auto grad_input = grad * (d < 1).type_as(grad); | ||
| auto grad_input = grad * (d < beta).type_as(grad) / beta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can beta be 0 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, good catch. I'll try rewriting it in a way to prevent divide by zeros.
| return norm_val * grad_output; | ||
| else | ||
| return norm_val * x * grad_output; | ||
| return norm_val * x * grad_output / beta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like if x == beta == 0, we could reach here, and then trigger division by 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might miss sth, but looks like this is different from the match given in the doc.
0.5 (x_i - y_i)^2 / beta, & \text{if } |x_i - y_i| < beta
Should the three condition be x <= -beta, x >= beta, and otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep you're right, thanks. And using x <= -beta and x >= beta should fix the division by 0 issue.
torch/testing/_internal/common_nn.py
Outdated
| output = ge_one_mask * (abs_diff - 0.5) + lt_one_mask * 0.5 * (abs_diff ** 2) | ||
| ge_beta_mask = (abs_diff >= beta).type_as(abs_diff) | ||
| lt_beta_mask = (abs_diff < beta).type_as(abs_diff) | ||
| output = ge_beta_mask * (abs_diff - 0.5 * beta) + lt_beta_mask * 0.5 * (abs_diff ** 2) / beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also need to handle divide by 0 case?
torch/nn/modules/loss.py
Outdated
| __constants__ = ['reduction'] | ||
|
|
||
| def __init__(self, size_average=None, reduce=None, reduction: str = 'mean') -> None: | ||
| beta: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes beta a class attribute instead of an instance attribute?
PR for issue: #16357 Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Differential Revision: [D23636720](https://our.internmc.facebook.com/intern/diff/D23636720) [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead updated the smooth_l1_loss signature in the torch api, added a torch test added a cpp_api_parity test to test the actual kernel calls Updating the python API + docs some test fixes fix linter errors fixing double backwards fn fixing smooth_l1_loss_out to update memory in place correctly removing TODOs kernel fix- casting beta to the same scalar type as the tensor to prevent unnecessary type conversions (e.g. when we have a tensor of floats) fixing test fixing divide-by-zero issues ghstack-source-id: 993e5df Pull Request resolved: #44433
PR for issue: #16357 Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Differential Revision: [D23636720](https://our.internmc.facebook.com/intern/diff/D23636720) [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead updated the smooth_l1_loss signature in the torch api, added a torch test added a cpp_api_parity test to test the actual kernel calls Updating the python API + docs some test fixes fix linter errors fixing double backwards fn fixing smooth_l1_loss_out to update memory in place correctly removing TODOs kernel fix- casting beta to the same scalar type as the tensor to prevent unnecessary type conversions (e.g. when we have a tensor of floats) fixing test fixing divide-by-zero issues ghstack-source-id: f050591 Pull Request resolved: #44433
| pickle=False) | ||
|
|
||
|
|
||
| def smoothl1loss_zero_beta_test(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be added to new_module_tests?
| lambda i: F.smooth_l1_loss(i, t.type_as(i), reduction='none', beta=0)), | ||
| cpp_function_call='''F::smooth_l1_loss( | ||
| i, t.to(i.options()), F::SmoothL1LossFuncOptions().reduction(torch::kNone), 0)''', | ||
| input_fn=lambda: torch.randn(2, 3, 4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't trigger zero loss, right? If we want to test zero loss, will using t.clone() as input work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
-
I started trying to modify the backwards kernels to avoid/check for divide-by-zeros, but I realized that setting beta=0 is really just the same thing as using l1_loss. I updated the definitions in
Loss.cppto check for that case and just call out to their correspondingl1_lossfunctions directly. -
I added a test to the suite that uses beta=0, but I'm not testing for the case that the two tensors are the same. That's because it doesn't look like that test will play well with this test suite. The test suite asserts the logic by computing the analytical jacobian of the loss (using my backwards) as well as the numerical jacobian (using (f(x+e)-f(x-e)/(2*e))), but that logic breaks down when one of the functions isn't continuous. In this case, the second derivative of smooth_l1_loss when beta=0 isn't continuous- it's equal to infinity at x=0, and 0 everywhere else.
| return norm_val * grad_output; | ||
| else | ||
| return norm_val * x * grad_output; | ||
| return norm_val * x * grad_output / beta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong. If it divides by beta in the forward pass, should we multiply the grad by beta in the backward pass?
torch/testing/_internal/common_nn.py
Outdated
| lt_beta_mask = (abs_diff < beta).type_as(abs_diff) | ||
| # guard against divide-by-zero when beta is zero. | ||
| if beta == 0: | ||
| output = abs_diff - 0.5 * beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since beta is 0, we can remove the 0.5 * beta from the formula?
PR for issue: #16357 Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Differential Revision: [D23636720](https://our.internmc.facebook.com/intern/diff/D23636720) [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead updated the smooth_l1_loss signature in the torch api, added a torch test added a cpp_api_parity test to test the actual kernel calls Updating the python API + docs some test fixes fix linter errors fixing double backwards fn fixing smooth_l1_loss_out to update memory in place correctly removing TODOs kernel fix- casting beta to the same scalar type as the tensor to prevent unnecessary type conversions (e.g. when we have a tensor of floats) fixing test fixing divide-by-zero issues - when beta <= 0 we should just call out directly to l1_loss ghstack-source-id: 81905de Pull Request resolved: #44433
PR for issue: #16357 Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Differential Revision: [D23636720](https://our.internmc.facebook.com/intern/diff/D23636720) [ghstack-poisoned]
PR for issue: #16357 Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Differential Revision: [D23636720](https://our.internmc.facebook.com/intern/diff/D23636720) [ghstack-poisoned]
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead updated the smooth_l1_loss signature in the torch api, added a torch test added a cpp_api_parity test to test the actual kernel calls Updating the python API + docs some test fixes fix linter errors fixing double backwards fn fixing smooth_l1_loss_out to update memory in place correctly removing TODOs kernel fix- casting beta to the same scalar type as the tensor to prevent unnecessary type conversions (e.g. when we have a tensor of floats) fixing test fixing divide-by-zero issues - when beta <= 0 we should just call out directly to l1_loss ghstack-source-id: dd5afc7 Pull Request resolved: #44433
|
Hey @JackCaoG, I see that the PR on the XLA side has approval, so I'm going to start trying to merge this soon. Thanks for helping! |
sounds good! |
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let's land after the XLA PR, so that we won't break master.
You guys can go first, in xla side I can simply merge the pr after this one merged(I think pytorch merge takes some time). |
| python_module: nn | ||
|
|
||
| - func: smooth_l1_loss_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, int reduction, *, Tensor(a!) grad_input) -> Tensor(a!) | ||
| - func: smooth_l1_loss_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, int reduction, float beta=1.0, *, Tensor(a!) grad_input) -> Tensor(a!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you shouldn't need to provide a default parameter -- we don't expose these backwards functions to python, they should only be called via autograd. And you'll notice that reduction doesn't have a default parameter.
The parts I'm not sure about here are if TorchScript would let you serialize this function (it shouldn't) and/or if the backwards compatibility test would complain.
| \begin{cases} | ||
| 0.5 (x_i - y_i)^2, & \text{if } |x_i - y_i| < 1 \\ | ||
| |x_i - y_i| - 0.5, & \text{otherwise } | ||
| 0.5 (x_i - y_i)^2 / beta, & \text{if } |x_i - y_i| < beta \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the above isn't accurate anymore ""Creates a criterion that uses a squared term if the absolute
element-wise error falls below 1 and an L1 term otherwise. element-wise error falls below 1 and an L1 term otherwise."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also document the beta < 0.
PR for issue: #16357
Stack from ghstack:
Not entirely sure why, but changing the type of beta from
floatto `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link timefixing some type errors, updated fn signature in a few more files
removing my usage of Scalar, making beta a double everywhere instead
Differential Revision: D23636720