Skip to content

Conversation

@muthuArivoli
Copy link
Contributor

Related to #38349

@muthuArivoli muthuArivoli marked this pull request as draft August 16, 2020 23:50
@dr-ci
Copy link

dr-ci bot commented Aug 16, 2020

💊 CI failures summary and remediations

As of commit 087c13e (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 144 times.

@muthuArivoli muthuArivoli changed the title [WIP] Implement torch.i0 and torch.kaiser_window Implement torch.i0 and torch.kaiser_window Aug 21, 2020
@muthuArivoli muthuArivoli marked this pull request as ready for review August 21, 2020 21:49
@muthuArivoli
Copy link
Contributor Author

@mruberry PTAL. This PR implements the kaiser window function. I have a few questions.

  1. The kaiser window function requires the modified Bessel function of the first kind, order 0 (i0). I went ahead and wrote an implementation for it, torch.i0, since numpy has the equivalent function np.i0. Is this alright, or should I remove the function?

  2. For the i0 function, I followed the scipy.special.i0 implementation (a wrapper around the Cephes library function) as recommended in the np documentation. However, np supports complex numbers, while scipy.special does not, so should I support complex numbers? It shouldn't be too hard, I think numpy just takes the magnitude of the complex number.

  3. For testing, I modified the already existing test_signal_windows function. Should I create a an entirely new test and not touch the existing test, or is it alright that I modified the existing test?

@ngimel ngimel requested a review from mruberry August 24, 2020 16:34
@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 24, 2020
@mruberry
Copy link
Collaborator

@mruberry PTAL. This PR implements the kaiser window function. I have a few questions.

These are excellent questions.

  1. The kaiser window function requires the modified Bessel function of the first kind, order 0 (i0). I went ahead and wrote an implementation for it, torch.i0, since numpy has the equivalent function np.i0. Is this alright, or should I remove the function?

That sounds correct but I'll take a closer look as I review (in a moment).

  1. For the i0 function, I followed the scipy.special.i0 implementation (a wrapper around the Cephes library function) as recommended in the np documentation. However, np supports complex numbers, while scipy.special does not, so should I support complex numbers? It shouldn't be too hard, I think numpy just takes the magnitude of the complex number.

Complex number support can be a separable follow-up. Let's focus on real values for the moment.

  1. For testing, I modified the already existing test_signal_windows function. Should I create a an entirely new test and not touch the existing test, or is it alright that I modified the existing test?

Modifying an existing test is fine. I'll take a look at the details during the review.

}

/*
* The following function has been modified from code that comes with the following copyright notice.
Copy link
Collaborator

@mruberry mruberry Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll follow-up on what the correct way to cite Cephes is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually took some time to compare scipy.special.i0 with an approximation of the modified Bessel function of the first kind that's accurate in the limit, and I was impressed with how closely they matched on both smaller and larger values. In particular, on larger values I had to go out to 40 iterations of the approximating series before approach the value produced by scipy.special.i0! I also compared two more modern and simpler approximations that try to fit the modified Bessel function of the first kind and was disappointed in how poorly they approximated the function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then I read this blog post: https://www.advanpix.com/2015/11/11/rational-approximations-for-the-modified-bessel-function-of-the-first-kind-i0-computations-double-precision/#id2020094592.

This was actually implemented in a modified form in Boost (see code links in the comments). I implemented their approach in Python (the correct argument is x * x / 4 for the small argument approximation, despite the blog's Latex showing a ceil) and was very impressed with its accuracy.

I actually really like this method because its derivation is clear and it's adaptable to different float types. How interested are you in the numerics? I think it'd be really interesting to investigate this approach further.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyright notice for Cephes we should use is:

Copyright (c) 2018, Steven Moshier
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL Steven Moshier BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Let's put this license in a note:

// Note [3-Clause BSD License for the Cephes Math Library]
// Code derived from implementations in the Cephes Math Library should mention its derivation and reference 
// this note (ex. 'This function is derived from the implementation of X in the Cephes Math Library. See note
// [3-Clause BSD License for the Cephes Math Library]. The license is:
// <license goes here>

Then each function can just explain its derivation and reference the note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would I place this note (what directory/file)? Also, should I change the comments on the zeta and digamma functions to follow this reference style for cephes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace this first comment in ATen/native/Math.h with it.

Thank you for asking. Yes, it'd be great if you would update the other Cephes references. Looks like there are existing references in ATen/native/cuda/Math.cuh, ATen/native/Math.h, and ATen/native/Distributions.h.

@muthuArivoli
Copy link
Contributor Author

It seems like there actually is an accuracy problem when using bfloat16 or float16. When I increased the size of the tensor to a 1000, both the bfloat16 and float16 tests fail, with a max error of 4. I'm not entirely sure why this is occurring when I increase the size to 1000 - I would expect a random sample of 100 values to fail, especially since the input is only in range [0, 13.25). I'm also not sure why this is only occurring on Windows/Mac, but works fine on Linux, I would expect it to also fail on Linux as well. Should I change the code back to using float32 for bfloat16 and float16?

@mruberry
Copy link
Collaborator

mruberry commented Sep 1, 2020

It seems like there actually is an accuracy problem when using bfloat16 or float16. When I increased the size of the tensor to a 1000, both the bfloat16 and float16 tests fail, with a max error of 4. I'm not entirely sure why this is occurring when I increase the size to 1000 - I would expect a random sample of 100 values to fail, especially since the input is only in range [0, 13.25). I'm also not sure why this is only occurring on Windows/Mac, but works fine on Linux, I would expect it to also fail on Linux as well. Should I change the code back to using float32 for bfloat16 and float16?

Interesting! There are a few options ... EDIT: see comment below.

* Cephes Math Library Release 2.8: June, 2000
* Copyright 1984, 1987, 1992, 2000 by Stephen L. Moshier
* This function is derived from the implementation of the digamma function in the Cephes Math Library.
* See note [3-Clause BSD License for the Cephes Math Library] in "ATen/native/Math.h".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for quotes around the path.

expected = np.nextafter(a.cpu().numpy(), b.cpu().numpy())
self.assertEqual(actual, expected, atol=0, rtol=0)

def _i0_helper(self, t, dtype):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this signature could be simplified by acquiring the dtype from t itself (t.dtype)


@dtypes(torch.float64)
@unittest.skipIf(not TEST_SCIPY, "SciPy not found")
def test_i0_range3(self, device, dtype):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice test granularity and use of helpers in these tests.

@mruberry
Copy link
Collaborator

mruberry commented Sep 1, 2020

Just did a quick scan. PR looks excellent. I think we just want to resolve the last design decision of how to compute for bfloat16 and float16. Let me try to enumerate the options:

  1. just compute in them unconditionally (but maybe document that the functions are inaccurate on Mac and Windows)
  2. compute in float32 but cast the intermediate values back to float16 or bfloat16 (i.e. float16 -> float32 computation -> float16)
  3. compute in float32 and return a result in float32 (like SciPy does)

Option (2) seems the safest to me. PyTorch likes to preserve dtypes where possible, and computing in float32 is easy to do (it would also let you support bfloat16 and half on both CPU and CUDA, I think, and without a custom bfloat16 vectorized implementation, even though that's very cool). Option (1) is better if it's actually faster and preserves accuracy across platforms, however.

What are your thoughts?

@muthuArivoli
Copy link
Contributor Author

Just did a quick scan. PR looks excellent. I think we just want to resolve the last design decision of how to compute for bfloat16 and float16. Let me try to enumerate the options:

  1. just compute in them unconditionally (but maybe document that the functions are inaccurate on Mac and Windows)
  2. compute in float32 but cast the intermediate values back to float16 or bfloat16 (i.e. float16 -> float32 computation -> float16)
  3. compute in float32 and return a result in float32 (like SciPy does)

Option (2) seems the safest to me. PyTorch likes to preserve dtypes where possible, and computing in float32 is easy to do (it would also let you support bfloat16 and half on both CPU and CUDA, I think, and without a custom bfloat16 vectorized implementation, even though that's very cool). Option (1) is better if it's actually faster and preserves accuracy across platforms, however.

What are your thoughts?

I agree, I think that option 2 is safer. Especially since other functions are going to be built on top of this function, an error here could propagate and could have unintended effects on those functions. I also think that the error of 4.0 is unacceptable when computing in the lower dtypes, and since it seems to only come out on certain random values spread through the domain, that could be very hard for people to predict against. It could also possibly cost people more time to cast their tensor up and then back down, if they are on Mac or Windows, so while people on Linux would see a performance gain, those on Mac/Windows would see a loss.

I think this could use another review now, I believe I have addressed all of the existing review comments.

return static_cast<T>(std::exp(x) * chbevl(static_cast<T>(32.0 / x - 2.0), B, 25) / std::sqrt(x));
}

inline c10::BFloat16 calc_i0(c10::BFloat16 a) { return calc_i0(static_cast<float>(a)); }
Copy link
Collaborator

@mruberry mruberry Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment explaining the upcasting behavior (and our reasoning for it)

}

void i0_kernel_cuda(TensorIterator& iter) {
AT_DISPATCH_FLOATING_TYPES_AND_HALF(iter.dtype(), "i0_cuda", [&]() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add bfloat16 here and see it just work since bfloat16 uses float32 as its accumulate type?

template <> struct AccumulateType<BFloat16, true> {using type = float; };

expected = np.nextafter(a.cpu().numpy(), b.cpu().numpy())
self.assertEqual(actual, expected, atol=0, rtol=0)

def _i0_helper(self, t):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brief comment here explaining the i0 tests and why they cut up the range.

@mruberry mruberry self-requested a review September 1, 2020 20:52
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! I left a few requests for tiny comments so that future readers will have an easier time understanding the code and tests. Just ping me when this is ready.

@muthuArivoli
Copy link
Contributor Author

@mruberry, I've added the comments and added bfloat16 for GPU. The ROCm test failed for some unrelated reason, so I'm not sure if bfloat16 actually works or not, but other than that I think this should be good to go.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry mruberry changed the title Implement torch.i0 and torch.kaiser_window Implement torch.i0 Sep 2, 2020
@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 719d29d.

@mruberry mruberry mentioned this pull request Aug 25, 2021
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants