-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Implement torch.i0 #43132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement torch.i0 #43132
Conversation
💊 CI failures summary and remediationsAs of commit 087c13e (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 144 times. |
|
@mruberry PTAL. This PR implements the kaiser window function. I have a few questions.
|
These are excellent questions.
That sounds correct but I'll take a closer look as I review (in a moment).
Complex number support can be a separable follow-up. Let's focus on real values for the moment.
Modifying an existing test is fine. I'll take a look at the details during the review. |
aten/src/ATen/native/Math.h
Outdated
| } | ||
|
|
||
| /* | ||
| * The following function has been modified from code that comes with the following copyright notice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll follow-up on what the correct way to cite Cephes is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually took some time to compare scipy.special.i0 with an approximation of the modified Bessel function of the first kind that's accurate in the limit, and I was impressed with how closely they matched on both smaller and larger values. In particular, on larger values I had to go out to 40 iterations of the approximating series before approach the value produced by scipy.special.i0! I also compared two more modern and simpler approximations that try to fit the modified Bessel function of the first kind and was disappointed in how poorly they approximated the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then I read this blog post: https://www.advanpix.com/2015/11/11/rational-approximations-for-the-modified-bessel-function-of-the-first-kind-i0-computations-double-precision/#id2020094592.
This was actually implemented in a modified form in Boost (see code links in the comments). I implemented their approach in Python (the correct argument is x * x / 4 for the small argument approximation, despite the blog's Latex showing a ceil) and was very impressed with its accuracy.
I actually really like this method because its derivation is clear and it's adaptable to different float types. How interested are you in the numerics? I think it'd be really interesting to investigate this approach further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copyright notice for Cephes we should use is:
Copyright (c) 2018, Steven Moshier
All rights reserved.Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL Steven Moshier BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Let's put this license in a note:
// Note [3-Clause BSD License for the Cephes Math Library]
// Code derived from implementations in the Cephes Math Library should mention its derivation and reference
// this note (ex. 'This function is derived from the implementation of X in the Cephes Math Library. See note
// [3-Clause BSD License for the Cephes Math Library]. The license is:
// <license goes here>
Then each function can just explain its derivation and reference the note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would I place this note (what directory/file)? Also, should I change the comments on the zeta and digamma functions to follow this reference style for cephes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace this first comment in ATen/native/Math.h with it.
Thank you for asking. Yes, it'd be great if you would update the other Cephes references. Looks like there are existing references in ATen/native/cuda/Math.cuh, ATen/native/Math.h, and ATen/native/Distributions.h.
|
It seems like there actually is an accuracy problem when using bfloat16 or float16. When I increased the size of the tensor to a 1000, both the bfloat16 and float16 tests fail, with a max error of 4. I'm not entirely sure why this is occurring when I increase the size to 1000 - I would expect a random sample of 100 values to fail, especially since the input is only in range [0, 13.25). I'm also not sure why this is only occurring on Windows/Mac, but works fine on Linux, I would expect it to also fail on Linux as well. Should I change the code back to using float32 for bfloat16 and float16? |
Interesting! There are a few options ... EDIT: see comment below. |
aten/src/ATen/native/Distributions.h
Outdated
| * Cephes Math Library Release 2.8: June, 2000 | ||
| * Copyright 1984, 1987, 1992, 2000 by Stephen L. Moshier | ||
| * This function is derived from the implementation of the digamma function in the Cephes Math Library. | ||
| * See note [3-Clause BSD License for the Cephes Math Library] in "ATen/native/Math.h". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for quotes around the path.
test/test_torch.py
Outdated
| expected = np.nextafter(a.cpu().numpy(), b.cpu().numpy()) | ||
| self.assertEqual(actual, expected, atol=0, rtol=0) | ||
|
|
||
| def _i0_helper(self, t, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this signature could be simplified by acquiring the dtype from t itself (t.dtype)
|
|
||
| @dtypes(torch.float64) | ||
| @unittest.skipIf(not TEST_SCIPY, "SciPy not found") | ||
| def test_i0_range3(self, device, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice test granularity and use of helpers in these tests.
|
Just did a quick scan. PR looks excellent. I think we just want to resolve the last design decision of how to compute for bfloat16 and float16. Let me try to enumerate the options:
Option (2) seems the safest to me. PyTorch likes to preserve dtypes where possible, and computing in float32 is easy to do (it would also let you support bfloat16 and half on both CPU and CUDA, I think, and without a custom bfloat16 vectorized implementation, even though that's very cool). Option (1) is better if it's actually faster and preserves accuracy across platforms, however. What are your thoughts? |
I agree, I think that option 2 is safer. Especially since other functions are going to be built on top of this function, an error here could propagate and could have unintended effects on those functions. I also think that the error of 4.0 is unacceptable when computing in the lower dtypes, and since it seems to only come out on certain random values spread through the domain, that could be very hard for people to predict against. It could also possibly cost people more time to cast their tensor up and then back down, if they are on Mac or Windows, so while people on Linux would see a performance gain, those on Mac/Windows would see a loss. I think this could use another review now, I believe I have addressed all of the existing review comments. |
| return static_cast<T>(std::exp(x) * chbevl(static_cast<T>(32.0 / x - 2.0), B, 25) / std::sqrt(x)); | ||
| } | ||
|
|
||
| inline c10::BFloat16 calc_i0(c10::BFloat16 a) { return calc_i0(static_cast<float>(a)); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment explaining the upcasting behavior (and our reasoning for it)
| } | ||
|
|
||
| void i0_kernel_cuda(TensorIterator& iter) { | ||
| AT_DISPATCH_FLOATING_TYPES_AND_HALF(iter.dtype(), "i0_cuda", [&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add bfloat16 here and see it just work since bfloat16 uses float32 as its accumulate type?
pytorch/aten/src/ATen/AccumulateType.h
Line 25 in 324c18f
| template <> struct AccumulateType<BFloat16, true> {using type = float; }; |
| expected = np.nextafter(a.cpu().numpy(), b.cpu().numpy()) | ||
| self.assertEqual(actual, expected, atol=0, rtol=0) | ||
|
|
||
| def _i0_helper(self, t): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brief comment here explaining the i0 tests and why they cut up the range.
mruberry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! I left a few requests for tiny comments so that future readers will have an easier time understanding the code and tests. Just ping me when this is ready.
|
@mruberry, I've added the comments and added bfloat16 for GPU. The ROCm test failed for some unrelated reason, so I'm not sure if bfloat16 actually works or not, but other than that I think this should be good to go. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Related to #38349