[jiterator] reduce kernel code duplication #73908

kshitij12345 · 2022-03-08T07:15:59Z

Introduce jiterator_code_stringify to reduce duplication of kernel code used with jiterator.

pytorch-bot · 2022-03-08T07:16:03Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/kshitij12345/pytorch/blob/31e0cded584513d751e375ef903b1daa2bf637a9/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-binary-manywheel	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
macos-arm64-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-arm64-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
macos-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-libtorch-debug	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
windows-binary-libtorch-release	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
windows-binary-wheel	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`, `ciflow/trunk`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`, `ciflow/xla`	🚫 skipped

facebook-github-bot · 2022-03-08T07:16:05Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/73908
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 07dc80a (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

kshitij12345 · 2022-03-08T07:46:32Z

aten/src/ATen/jiterator_macros.h

@@ -0,0 +1,33 @@
+#pragma once


Moved to a new file as jit_macros.h includes CUDAConfig.h which is only available in CUDA build.

lezcano

Nice to see the direction this is taking! Now, I have one question really. I don't see the equivalent of some bits of code that were removed in this PR. Where did those go?

I also added a small performance nit that could be fixed in this PR or a separate PR.

lezcano · 2022-03-10T11:20:21Z

aten/src/ATen/native/Math.h

+          T x = fabs(_x);
+
+          if (x <= T{8.0}) {
+            T coefficients[] = {


Nit unrelated to this PR. Make this constexp or, at the very least, static (provided cuda allows..)

Makes sense. Thanks!

lezcano · 2022-03-10T11:21:47Z

aten/src/ATen/native/Math.h

-/*
- * This function is derived from the implementation of the i0e function in the Cephes Math Library.
- * See note [3-Clause BSD License for the Cephes Math Library].
- *
- * Computes an approximation of the exponentially scaled zeroth order modified Bessel function of the first kind.
- * The approximation is actually two (sub)approximations, both using a Chebyshev polynomial expansion.
- * One approximates the function over [0, 8], and the other over (8, infinity). This function takes the absolute value
- * of all inputs to convert them into the domain of the approximation.
- */
-template <typename T>
-static inline typename std::enable_if<std::is_floating_point<T>::value, T>::type
-calc_i0e(T _x) {
-  T x = std::abs(_x);
-
-  if (x <= T{8.0}) {
-    auto coeff_pair = chebyshev_coefficients_i0e_A<T>();
-    auto A = std::get<0>(coeff_pair);
-    auto len = std::get<1>(coeff_pair);
-    T y = (x / T{2.0}) - T{2.0};
-    return chbevl(y, A, len);
-  }
-
-  auto coeff_pair = chebyshev_coefficients_i0e_B<T>();
-  auto B = std::get<0>(coeff_pair);
-  auto len = std::get<1>(coeff_pair);
-  return chbevl(T{32.0} / x - T{2.0}, B, len) / std::sqrt(x);
-}
-
-// Upcast bfloat16 input to float for numerical accuracy purposes
-static inline c10::BFloat16 calc_i0e(c10::BFloat16 a) { return calc_i0e(static_cast<float>(a)); }


Where did all this code go?

Jiterated version of the code has the coefficients (which we get from chebyshev_coefficients_i0e_B and chebyshev_coefficients_i0e_A)
And BFloat16 upcasting here was redundant as it is handled at other places.

Existing tests in test_unary_ufuncs verify the correctness against scipy implementation for CPU and CUDA.

lezcano · 2022-03-10T11:22:19Z

aten/src/ATen/native/cuda/Math.cuh

-template <typename scalar_t>
-static inline C10_HOST_DEVICE scalar_t calc_i0e(scalar_t _x) {
-  static_assert(!std::is_same<scalar_t, Half>() && !std::is_same<scalar_t, BFloat16>(), "don't instantiate with low precision type");
-  scalar_t x = ::abs(_x);
-  if (x <= scalar_t{8.0}) {
-    auto coeff_pair = chebyshev_coefficients_i0e_A<scalar_t>();
-    auto A = std::get<0>(coeff_pair);
-    auto len = std::get<1>(coeff_pair);
-    scalar_t y = (x / scalar_t{2.0}) - scalar_t{2.0};
-    return (chbevl(y, A, len));
-  }
-
-  auto coeff_pair = chebyshev_coefficients_i0e_B<scalar_t>();
-  auto B = std::get<0>(coeff_pair);
-  auto len = std::get<1>(coeff_pair);
-  return (chbevl(scalar_t{32.0} / x - scalar_t{2.0}, B, len) / ::sqrt(x));
-}


Same, I don't see what's the equivalent of this code in the new PR.

mruberry · 2022-03-14T09:08:10Z

aten/src/ATen/native/Math.h

+ * function takes the absolute value of all inputs to convert them into the
+ * domain of the approximation.
+ */
+jiterator_code_stringify(


In the future I would write jiterator_code( on the same line as jiterator_code_stringify to avoid the extra level of indentation

mruberry · 2022-03-14T09:14:52Z

aten/src/ATen/jiterator_macros.h

+#if defined(__CUDACC__)
+    // CPU and CUDA case
+    #define stringify_code(...) #__VA_ARGS__
+    #define jiterator_code_stringify(code, str_name)                    \


naming suggestion: jiterator_also_stringify_as

that might make the fact that the code is preserved clearer?

mruberry · 2022-03-14T09:15:12Z

Looks pretty good to me -- any reason this is still in draft?

cc @anjali411

kshitij12345 · 2022-03-14T09:50:41Z

any reason this is still in draft?

Forgot to mark it as ready 😅

mruberry · 2022-03-14T09:51:47Z

any reason this is still in draft?

Forgot to mark it as ready 😅

No worries -- just tweak the name and ping me when this is ready to merge

kshitij12345 · 2022-03-14T09:59:47Z

@mruberry have addressed the review. Should be ready once the CI is green. Thanks :)!

facebook-github-bot · 2022-03-14T10:01:26Z

@mruberry has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kshitij12345 · 2022-03-17T06:22:29Z

Ping @mruberry

kshitij12345 · 2022-03-23T11:38:06Z

Ping @mruberry

mruberry · 2022-03-28T14:45:01Z

@pytorchbot merge this please

pytorchmergebot · 2022-03-28T14:46:18Z

Merge failed due to PR 73908 does not match merge rules
Raised by https://github.com/pytorch/pytorch/actions/runs/2053015597

Summary: Introduce `jiterator_code_stringify` to reduce duplication of kernel code used with jiterator. Pull Request resolved: #73908 Reviewed By: ngimel Differential Revision: D34858716 Pulled By: mruberry fbshipit-source-id: f87a34e4966b31620bbc5c7d93f0387fc1980ded

github-actions · 2022-03-28T15:01:49Z

Hey @kshitij12345.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

[jiterator] reduce kernel code duplication

31e0cde

pytorch-bot bot added the ciflow/default label Mar 8, 2022

facebook-github-bot added the cla signed label Mar 8, 2022

pytorchbot added the open source label Mar 8, 2022

kshitij12345 added 2 commits March 8, 2022 07:41

seperate from jit_macros as it imports CUDAConfig

3340dab

remove unnecessary include

9fb5345

kshitij12345 commented Mar 8, 2022

View reviewed changes

kshitij12345 added 7 commits March 8, 2022 07:55

update preprocessor condition and update comment

31587a6

fix for ROCM

5ba783b

try fix for windows

7f7ff4e

remove unnecessary overload

a7ea40e

try to fix windows fail

9946c67

update comment

f219f50

Merge branch 'master' into jiterator/reduce-code-duplication

d0bbd74

kshitij12345 requested review from lezcano and peterbell10 March 10, 2022 11:07

lezcano reviewed Mar 10, 2022

View reviewed changes

make coeff static const

6f06a62

kshitij12345 force-pushed the jiterator/reduce-code-duplication branch from bc72491 to 6f06a62 Compare March 10, 2022 12:16

kshitij12345 requested a review from mruberry March 10, 2022 17:54

mruberry reviewed Mar 14, 2022

View reviewed changes

kshitij12345 marked this pull request as ready for review March 14, 2022 09:50

kshitij12345 added 2 commits March 14, 2022 09:57

address review

7e130b3

Merge branch 'master' into jiterator/reduce-code-duplication

07dc80a

samdow added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 14, 2022

kshitij12345 added the module: jiterator label Mar 15, 2022

suo removed the ciflow/default label Mar 22, 2022

pytorchmergebot closed this in c96ced8 Mar 28, 2022

kshitij12345 mentioned this pull request Mar 29, 2022

Implement torch.special.log_ndtr #74795

Closed

3 tasks

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

[jiterator] reduce kernel code duplication #73908

[jiterator] reduce kernel code duplication #73908

Uh oh!

Conversation

kshitij12345 commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 8, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mruberry commented Mar 14, 2022

Uh oh!

kshitij12345 commented Mar 14, 2022

Uh oh!

mruberry commented Mar 14, 2022

Uh oh!

kshitij12345 commented Mar 14, 2022

Uh oh!

facebook-github-bot commented Mar 14, 2022

Uh oh!

kshitij12345 commented Mar 17, 2022

Uh oh!

kshitij12345 commented Mar 23, 2022

Uh oh!

mruberry commented Mar 28, 2022

Uh oh!

pytorchmergebot commented Mar 28, 2022

Uh oh!

github-actions bot commented Mar 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

kshitij12345 commented Mar 8, 2022 •

edited

Loading

facebook-github-bot commented Mar 8, 2022 •

edited

Loading