Implement torch.i0 #43132

muthuArivoli · 2020-08-16T23:50:44Z

Related to #38349

dr-ci · 2020-08-16T23:51:56Z

💊 CI failures summary and remediations

As of commit 087c13e (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm3.7-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 144 times.

muthuArivoli · 2020-08-21T21:51:14Z

@mruberry PTAL. This PR implements the kaiser window function. I have a few questions.

The kaiser window function requires the modified Bessel function of the first kind, order 0 (i0). I went ahead and wrote an implementation for it, torch.i0, since numpy has the equivalent function np.i0. Is this alright, or should I remove the function?
For the i0 function, I followed the scipy.special.i0 implementation (a wrapper around the Cephes library function) as recommended in the np documentation. However, np supports complex numbers, while scipy.special does not, so should I support complex numbers? It shouldn't be too hard, I think numpy just takes the magnitude of the complex number.
For testing, I modified the already existing test_signal_windows function. Should I create a an entirely new test and not touch the existing test, or is it alright that I modified the existing test?

mruberry · 2020-08-26T04:59:41Z

@mruberry PTAL. This PR implements the kaiser window function. I have a few questions.

These are excellent questions.

The kaiser window function requires the modified Bessel function of the first kind, order 0 (i0). I went ahead and wrote an implementation for it, torch.i0, since numpy has the equivalent function np.i0. Is this alright, or should I remove the function?

That sounds correct but I'll take a closer look as I review (in a moment).

For the i0 function, I followed the scipy.special.i0 implementation (a wrapper around the Cephes library function) as recommended in the np documentation. However, np supports complex numbers, while scipy.special does not, so should I support complex numbers? It shouldn't be too hard, I think numpy just takes the magnitude of the complex number.

Complex number support can be a separable follow-up. Let's focus on real values for the moment.

For testing, I modified the already existing test_signal_windows function. Should I create a an entirely new test and not touch the existing test, or is it alright that I modified the existing test?

Modifying an existing test is fine. I'll take a look at the details during the review.

mruberry · 2020-08-26T05:08:37Z

aten/src/ATen/native/Math.h

 }
+
+/*
+ * The following function has been modified from code that comes with the following copyright notice.


I'll follow-up on what the correct way to cite Cephes is.

I actually took some time to compare scipy.special.i0 with an approximation of the modified Bessel function of the first kind that's accurate in the limit, and I was impressed with how closely they matched on both smaller and larger values. In particular, on larger values I had to go out to 40 iterations of the approximating series before approach the value produced by scipy.special.i0! I also compared two more modern and simpler approximations that try to fit the modified Bessel function of the first kind and was disappointed in how poorly they approximated the function.

So then I read this blog post: https://www.advanpix.com/2015/11/11/rational-approximations-for-the-modified-bessel-function-of-the-first-kind-i0-computations-double-precision/#id2020094592.

This was actually implemented in a modified form in Boost (see code links in the comments). I implemented their approach in Python (the correct argument is x * x / 4 for the small argument approximation, despite the blog's Latex showing a ceil) and was very impressed with its accuracy.

I actually really like this method because its derivation is clear and it's adaptable to different float types. How interested are you in the numerics? I think it'd be really interesting to investigate this approach further.

The copyright notice for Cephes we should use is:

Copyright (c) 2018, Steven Moshier
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL Steven Moshier BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Let's put this license in a note:

// Note [3-Clause BSD License for the Cephes Math Library] // Code derived from implementations in the Cephes Math Library should mention its derivation and reference // this note (ex. 'This function is derived from the implementation of X in the Cephes Math Library. See note // [3-Clause BSD License for the Cephes Math Library]. The license is: // <license goes here>

Then each function can just explain its derivation and reference the note.

Where would I place this note (what directory/file)? Also, should I change the comments on the zeta and digamma functions to follow this reference style for cephes?

You can replace this first comment in ATen/native/Math.h with it.

Thank you for asking. Yes, it'd be great if you would update the other Cephes references. Looks like there are existing references in ATen/native/cuda/Math.cuh, ATen/native/Math.h, and ATen/native/Distributions.h.

aten/src/ATen/native/Math.h

aten/src/ATen/native/TensorFactories.cpp

muthuArivoli · 2020-08-31T17:18:55Z

It seems like there actually is an accuracy problem when using bfloat16 or float16. When I increased the size of the tensor to a 1000, both the bfloat16 and float16 tests fail, with a max error of 4. I'm not entirely sure why this is occurring when I increase the size to 1000 - I would expect a random sample of 100 values to fail, especially since the input is only in range [0, 13.25). I'm also not sure why this is only occurring on Windows/Mac, but works fine on Linux, I would expect it to also fail on Linux as well. Should I change the code back to using float32 for bfloat16 and float16?

This reverts commit 008248d.

mruberry · 2020-09-01T03:00:45Z

It seems like there actually is an accuracy problem when using bfloat16 or float16. When I increased the size of the tensor to a 1000, both the bfloat16 and float16 tests fail, with a max error of 4. I'm not entirely sure why this is occurring when I increase the size to 1000 - I would expect a random sample of 100 values to fail, especially since the input is only in range [0, 13.25). I'm also not sure why this is only occurring on Windows/Mac, but works fine on Linux, I would expect it to also fail on Linux as well. Should I change the code back to using float32 for bfloat16 and float16?

Interesting! There are a few options ... EDIT: see comment below.

aten/src/ATen/core/aten_interned_strings.h

mruberry · 2020-09-01T03:02:16Z

aten/src/ATen/native/Distributions.h

- * Cephes Math Library Release 2.8:  June, 2000
- * Copyright 1984, 1987, 1992, 2000 by Stephen L. Moshier
+ * This function is derived from the implementation of the digamma function in the Cephes Math Library.
+ * See note [3-Clause BSD License for the Cephes Math Library] in "ATen/native/Math.h".


No need for quotes around the path.

mruberry · 2020-09-01T03:12:25Z

test/test_torch.py

        expected = np.nextafter(a.cpu().numpy(), b.cpu().numpy())
        self.assertEqual(actual, expected, atol=0, rtol=0)

+    def _i0_helper(self, t, dtype):


Nit: this signature could be simplified by acquiring the dtype from t itself (t.dtype)

mruberry · 2020-09-01T03:12:56Z

test/test_torch.py

+
+    @dtypes(torch.float64)
+    @unittest.skipIf(not TEST_SCIPY, "SciPy not found")
+    def test_i0_range3(self, device, dtype):


Really nice test granularity and use of helpers in these tests.

mruberry · 2020-09-01T03:19:46Z

Just did a quick scan. PR looks excellent. I think we just want to resolve the last design decision of how to compute for bfloat16 and float16. Let me try to enumerate the options:

just compute in them unconditionally (but maybe document that the functions are inaccurate on Mac and Windows)
compute in float32 but cast the intermediate values back to float16 or bfloat16 (i.e. float16 -> float32 computation -> float16)
compute in float32 and return a result in float32 (like SciPy does)

Option (2) seems the safest to me. PyTorch likes to preserve dtypes where possible, and computing in float32 is easy to do (it would also let you support bfloat16 and half on both CPU and CUDA, I think, and without a custom bfloat16 vectorized implementation, even though that's very cool). Option (1) is better if it's actually faster and preserves accuracy across platforms, however.

What are your thoughts?

muthuArivoli · 2020-09-01T15:50:02Z

Just did a quick scan. PR looks excellent. I think we just want to resolve the last design decision of how to compute for bfloat16 and float16. Let me try to enumerate the options:

just compute in them unconditionally (but maybe document that the functions are inaccurate on Mac and Windows)

compute in float32 but cast the intermediate values back to float16 or bfloat16 (i.e. float16 -> float32 computation -> float16)

compute in float32 and return a result in float32 (like SciPy does)

Option (2) seems the safest to me. PyTorch likes to preserve dtypes where possible, and computing in float32 is easy to do (it would also let you support bfloat16 and half on both CPU and CUDA, I think, and without a custom bfloat16 vectorized implementation, even though that's very cool). Option (1) is better if it's actually faster and preserves accuracy across platforms, however.

What are your thoughts?

I agree, I think that option 2 is safer. Especially since other functions are going to be built on top of this function, an error here could propagate and could have unintended effects on those functions. I also think that the error of 4.0 is unacceptable when computing in the lower dtypes, and since it seems to only come out on certain random values spread through the domain, that could be very hard for people to predict against. It could also possibly cost people more time to cast their tensor up and then back down, if they are on Mac or Windows, so while people on Linux would see a performance gain, those on Mac/Windows would see a loss.

I think this could use another review now, I believe I have addressed all of the existing review comments.

mruberry · 2020-09-01T20:42:42Z

aten/src/ATen/native/Math.h

+  return static_cast<T>(std::exp(x) * chbevl(static_cast<T>(32.0 / x - 2.0), B, 25) / std::sqrt(x));
+}
+
+inline c10::BFloat16 calc_i0(c10::BFloat16 a) { return calc_i0(static_cast<float>(a)); }


Let's add a comment explaining the upcasting behavior (and our reasoning for it)

mruberry · 2020-09-01T20:48:06Z

aten/src/ATen/native/cuda/UnaryOpsKernel.cu

 }

+void i0_kernel_cuda(TensorIterator& iter) {
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(iter.dtype(), "i0_cuda", [&]() {


Could we add bfloat16 here and see it just work since bfloat16 uses float32 as its accumulate type?

pytorch/aten/src/ATen/AccumulateType.h

Line 25 in 324c18f

template <> struct AccumulateType<BFloat16, true> {using type = float; };

mruberry · 2020-09-01T20:49:08Z

test/test_torch.py

        expected = np.nextafter(a.cpu().numpy(), b.cpu().numpy())
        self.assertEqual(actual, expected, atol=0, rtol=0)

+    def _i0_helper(self, t):


Brief comment here explaining the i0 tests and why they cut up the range.

mruberry

This is awesome! I left a few requests for tiny comments so that future readers will have an easier time understanding the code and tests. Just ping me when this is ready.

muthuArivoli · 2020-09-02T13:22:46Z

@mruberry, I've added the comments and added bfloat16 for GPU. The ROCm test failed for some unrelated reason, so I'm not sure if bfloat16 actually works or not, but other than that I think this should be good to go.

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-09-06T10:10:50Z

@mruberry merged this pull request in 719d29d.

muthuArivoli added 6 commits August 16, 2020 02:26

implement bessel function

13560f9

implement kaiser

e922c86

add tests for i0

8029d52

bound tests

60eb59d

add docs

6615770

autograd

df513f2

muthuArivoli marked this pull request as draft August 16, 2020 23:50

pytorchbot added the open source label Aug 17, 2020

muthuArivoli added 11 commits August 17, 2020 00:38

attempt to fix cuda

1601252

use c10 cuda compat

bbca196

try with function implementation in cuda

95d92df

fix casting

d6aeb9f

Merge remote-tracking branch 'upstream/master' into implement-kaiser

b68bdb6

use doubles and test against scipy

e49d160

fix casting in cuda

b54a701

fix tests

4c52a39

fix tests 2

3586f98

fix bfloat16

19839fc

fix float16 test

852ae9d

muthuArivoli changed the title ~~[WIP] Implement torch.i0 and torch.kaiser_window~~ Implement torch.i0 and torch.kaiser_window Aug 21, 2020

muthuArivoli marked this pull request as ready for review August 21, 2020 21:49

ngimel requested a review from mruberry August 24, 2020 16:34

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 24, 2020

mruberry reviewed Aug 26, 2020

View reviewed changes

aten/src/ATen/native/Math.h Show resolved Hide resolved

mruberry reviewed Aug 26, 2020

View reviewed changes

aten/src/ATen/native/Math.h Show resolved Hide resolved

mruberry reviewed Aug 26, 2020

View reviewed changes

aten/src/ATen/native/TensorFactories.cpp Outdated Show resolved Hide resolved

update tests

fde8113

muthuArivoli added 3 commits August 31, 2020 21:44

Revert "compute in given type"

243209b

This reverts commit 008248d.

compute in float for bfloat

8ae978a

Merge remote-tracking branch 'upstream/master' into implement-kaiser

23c3c36

mruberry reviewed Sep 1, 2020

View reviewed changes

aten/src/ATen/core/aten_interned_strings.h Show resolved Hide resolved

mruberry reviewed Sep 1, 2020

View reviewed changes

muthuArivoli added 2 commits September 1, 2020 01:37

Merge remote-tracking branch 'upstream/master' into implement-kaiser

8c7e095

review fixes

1bfd21d

mruberry reviewed Sep 1, 2020

View reviewed changes

mruberry self-requested a review September 1, 2020 20:52

mruberry approved these changes Sep 1, 2020

View reviewed changes

muthuArivoli added 2 commits September 1, 2020 23:09

updates

3b5c0f3

fix comments

087c13e

facebook-github-bot reviewed Sep 2, 2020

View reviewed changes

mruberry changed the title ~~Implement torch.i0 and torch.kaiser_window~~ Implement torch.i0 Sep 2, 2020

facebook-github-bot closed this in 719d29d Sep 6, 2020

facebook-github-bot added the merged label Sep 6, 2020

mruberry added the Merged label Oct 28, 2020

mruberry mentioned this pull request Aug 25, 2021

torch.special tracking issue #50345

Open

17 tasks

Implement torch.i0 #43132

Implement torch.i0 #43132

Uh oh!

Conversation

muthuArivoli commented Aug 16, 2020

Uh oh!

dr-ci bot commented Aug 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

muthuArivoli commented Aug 21, 2020

Uh oh!

mruberry commented Aug 26, 2020

Uh oh!

mruberry Aug 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

muthuArivoli commented Aug 31, 2020

Uh oh!

mruberry commented Sep 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mruberry commented Sep 1, 2020

Uh oh!

muthuArivoli commented Sep 1, 2020

Uh oh!

mruberry Sep 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

muthuArivoli commented Sep 2, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dr-ci bot commented Aug 16, 2020 •

edited

Loading

mruberry Aug 26, 2020 •

edited

Loading

mruberry commented Sep 1, 2020 •

edited

Loading

mruberry Sep 1, 2020 •

edited

Loading