Skip to content

Conversation

@kshitij12345
Copy link
Collaborator

@kshitij12345 kshitij12345 commented Mar 8, 2022

Introduce jiterator_code_stringify to reduce duplication of kernel code used with jiterator.

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 8, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/kshitij12345/pytorch/blob/31e0cded584513d751e375ef903b1daa2bf637a9/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
linux-binary-manywheel ciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build ciflow/all, ciflow/cpu, ciflow/default, ciflow/libtorch, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-debug ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
windows-binary-libtorch-release ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
windows-binary-wheel ciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 8, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 07dc80a (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@@ -0,0 +1,33 @@
#pragma once
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a new file as jit_macros.h includes CUDAConfig.h which is only available in CUDA build.

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see the direction this is taking! Now, I have one question really. I don't see the equivalent of some bits of code that were removed in this PR. Where did those go?

I also added a small performance nit that could be fixed in this PR or a separate PR.

T x = fabs(_x);

if (x <= T{8.0}) {
T coefficients[] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit unrelated to this PR. Make this constexp or, at the very least, static (provided cuda allows..)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks!

Comment on lines -1388 to -1417
/*
* This function is derived from the implementation of the i0e function in the Cephes Math Library.
* See note [3-Clause BSD License for the Cephes Math Library].
*
* Computes an approximation of the exponentially scaled zeroth order modified Bessel function of the first kind.
* The approximation is actually two (sub)approximations, both using a Chebyshev polynomial expansion.
* One approximates the function over [0, 8], and the other over (8, infinity). This function takes the absolute value
* of all inputs to convert them into the domain of the approximation.
*/
template <typename T>
static inline typename std::enable_if<std::is_floating_point<T>::value, T>::type
calc_i0e(T _x) {
T x = std::abs(_x);

if (x <= T{8.0}) {
auto coeff_pair = chebyshev_coefficients_i0e_A<T>();
auto A = std::get<0>(coeff_pair);
auto len = std::get<1>(coeff_pair);
T y = (x / T{2.0}) - T{2.0};
return chbevl(y, A, len);
}

auto coeff_pair = chebyshev_coefficients_i0e_B<T>();
auto B = std::get<0>(coeff_pair);
auto len = std::get<1>(coeff_pair);
return chbevl(T{32.0} / x - T{2.0}, B, len) / std::sqrt(x);
}

// Upcast bfloat16 input to float for numerical accuracy purposes
static inline c10::BFloat16 calc_i0e(c10::BFloat16 a) { return calc_i0e(static_cast<float>(a)); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did all this code go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jiterated version of the code has the coefficients (which we get from chebyshev_coefficients_i0e_B and chebyshev_coefficients_i0e_A)
And BFloat16 upcasting here was redundant as it is handled at other places.

Existing tests in test_unary_ufuncs verify the correctness against scipy implementation for CPU and CUDA.

Comment on lines -1512 to -1528
template <typename scalar_t>
static inline C10_HOST_DEVICE scalar_t calc_i0e(scalar_t _x) {
static_assert(!std::is_same<scalar_t, Half>() && !std::is_same<scalar_t, BFloat16>(), "don't instantiate with low precision type");
scalar_t x = ::abs(_x);
if (x <= scalar_t{8.0}) {
auto coeff_pair = chebyshev_coefficients_i0e_A<scalar_t>();
auto A = std::get<0>(coeff_pair);
auto len = std::get<1>(coeff_pair);
scalar_t y = (x / scalar_t{2.0}) - scalar_t{2.0};
return (chbevl(y, A, len));
}

auto coeff_pair = chebyshev_coefficients_i0e_B<scalar_t>();
auto B = std::get<0>(coeff_pair);
auto len = std::get<1>(coeff_pair);
return (chbevl(scalar_t{32.0} / x - scalar_t{2.0}, B, len) / ::sqrt(x));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I don't see what's the equivalent of this code in the new PR.

@kshitij12345 kshitij12345 force-pushed the jiterator/reduce-code-duplication branch from bc72491 to 6f06a62 Compare March 10, 2022 12:16
@kshitij12345 kshitij12345 requested a review from mruberry March 10, 2022 17:54
* function takes the absolute value of all inputs to convert them into the
* domain of the approximation.
*/
jiterator_code_stringify(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future I would write jiterator_code( on the same line as jiterator_code_stringify to avoid the extra level of indentation

#if defined(__CUDACC__)
// CPU and CUDA case
#define stringify_code(...) #__VA_ARGS__
#define jiterator_code_stringify(code, str_name) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming suggestion: jiterator_also_stringify_as

that might make the fact that the code is preserved clearer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@mruberry
Copy link
Collaborator

Looks pretty good to me -- any reason this is still in draft?

cc @anjali411

@kshitij12345 kshitij12345 marked this pull request as ready for review March 14, 2022 09:50
@kshitij12345
Copy link
Collaborator Author

any reason this is still in draft?

Forgot to mark it as ready 😅

@mruberry
Copy link
Collaborator

any reason this is still in draft?

Forgot to mark it as ready 😅

No worries -- just tweak the name and ping me when this is ready to merge

@kshitij12345
Copy link
Collaborator Author

@mruberry have addressed the review. Should be ready once the CI is green. Thanks :)!

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@samdow samdow added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 14, 2022
@kshitij12345
Copy link
Collaborator Author

Ping @mruberry

@kshitij12345
Copy link
Collaborator Author

Ping @mruberry

@mruberry
Copy link
Collaborator

@pytorchbot merge this please

@pytorchmergebot
Copy link
Collaborator

Merge failed due to PR 73908 does not match merge rules
Raised by https://github.com/pytorch/pytorch/actions/runs/2053015597

facebook-github-bot pushed a commit that referenced this pull request Mar 28, 2022
Summary:
Introduce `jiterator_code_stringify` to reduce duplication of kernel code used with jiterator.

Pull Request resolved: #73908

Reviewed By: ngimel

Differential Revision: D34858716

Pulled By: mruberry

fbshipit-source-id: f87a34e4966b31620bbc5c7d93f0387fc1980ded
@github-actions
Copy link
Contributor

Hey @kshitij12345.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: jiterator open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants