-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Refactor THCNumerics and add common math functions for at::Half #10301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
ROCm failure is not directly your fault, but it is probably real. |
aten/src/ATen/core/Half-inl.h
Outdated
| // compatibility | ||
|
|
||
| inline AT_HOSTDEVICE at::Half lgamma(at::Half a) { | ||
| return (at::Half)lgammaf((float)a); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
To fix the ROCm error, I'd advise just disabling the Half related functionality when the ROCm build is enabled. Half seems to cause a lot of problems for the ROCm toolchain. |
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty reasonable to me, but build failures will have to be worked around somehow.
aten/src/ATen/core/Half-inl.h
Outdated
| // at all. However, keeping all the function definitions to provide backward | ||
| // compatibility | ||
|
|
||
| inline AT_HOSTDEVICE at::Half lgamma(at::Half a) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/cuda/CUDANumerics.cuh
Outdated
| } | ||
|
|
||
| template <> | ||
| struct numerics<uint8_t> { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/cuda/CUDANumerics.cuh
Outdated
| }; | ||
|
|
||
| template <typename scalar_t> | ||
| static inline __host__ __device__ scalar_t powi(scalar_t a, scalar_t b) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/native/cuda/SoftMax.cu
Outdated
| // find the max | ||
| accscalar_t threadMax = ilpReduce<MaxFloat, ILP, scalar_t, accscalar_t>( | ||
| input, classes, MaxFloat<scalar_t, accscalar_t>(), -THCNumerics<accscalar_t>::max()); | ||
| input, classes, MaxFloat<scalar_t, accscalar_t>(), -std::numeric_limits<accscalar_t>::max()); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@ezyang @colesbury Thanks for the review. Will follow up with fixes. |
|
Change list:
|
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Thanks for cleaning up these headers. Just a few small comments.
aten/src/ATen/cuda/CUDANumerics.cuh
Outdated
| #include <cuda.h> | ||
| #include <limits.h> | ||
|
|
||
| // CUDANumerics.cuh is a holder for mathematical functions that are either |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/cuda/CUDANumerics.cuh
Outdated
| template <> | ||
| struct numeric_limits<uint8_t> { | ||
| static inline __host__ __device__ uint8_t lowest() { return 0; } | ||
| static inline __host__ __device__ uint8_t max() { return UCHAR_MAX; } |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| // half API for the common mathematical functions. | ||
| // Note: When calling std math functions from device, don't | ||
| // use the std namespace, but just "::" so that the function | ||
| // gets resolved from nvcc math_functions.hpp |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Looks like the two failing builds are not related to this PR? Are there any more changes needed for this PR? |
|
@pytorchbot retest this please |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@pytorchbot retest this please |
|
How is the internal CI looking for this PR? |
|
Looks good, I'll reimport and land. |
Summary: **Summary**: This PR is a followup of mruberry's pytorch/pytorch#9318. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR pytorch/pytorch#10147) **Comments**: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check pytorch/pytorch#8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: ROCm/hip#374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - pytorch/pytorch#6786 - pytorch/pytorch#5475 - pytorch/pytorch#9401 - pytorch/pytorch#8689 - pytorch/pytorch#8919 Pull Request resolved: pytorch/pytorch#10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a
* upstream/master: (89 commits) move HeatmapMaxKeypointOp unittest to oss fix xfails involving literals (pytorch#10905) Bag of Distributions doc fixes (pytorch#10894) Remove FIXME_zerol() from test_jit.py (pytorch#10900) Increase BC for PackedSequence ctor (pytorch#9864) Remove ability of Scalars to hold Tensors. Begin a bestiary of MSVC/NVCC bugs. (pytorch#10883) Prevent JIT from overspecializing to every single size configuration (pytorch#10844) Handling failing test on ROCm. Update mobile predictor caller's interface Cache isContiguous and numel Create class constant for string literal 'blob_names' Conv BN fusion for 3D conv (pytorch#10239) Stop using symbolic override for tracing RNNs (pytorch#10638) Add registry to pybind_state (pytorch#10759) Remove the nanopb submodule Create at::linear (pytorch#10799) Refactor THCNumerics and add common math functions for at::Half (pytorch#10301) Remove Tensor constructor of Scalar. (pytorch#10852) Revert D9492561: [pytorch][PR] Moving the operator argument to the front for kernelPointwiseApply. ...
|
I am getting the same issue as @soumith. However, this is when building a CUDA extension. Error log:
|
|
Can confirm. Checked out 87a7840 and compiled everything again. Extension compilation succeeds without a hitch. |
|
Hi @varunagrawal. Could you please post a snippet of the code you are compiling, especially the line which does the "<" comparison? From the error, it says you are comparing at::Half with at::Half, which shouldn't need THCNumerics.cuh. Would be happy to reproduce on my side and help. :) |
|
The compiler isn't quite showing the file or code snippet that's causing the issue. This is what I get: I honestly can't decipher the issue other than it seems to have problems linking with some code in THCNumerics.cuh. |
|
Could you please post the code for |
|
@syed-ahmed I am not on that slack workspace and unfortunately I don't fit the criteria for creating an account there. You can find the code for |
|
Is this the latest version of the |
|
It is the latest. Doesn't |
|
From what I can see here: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L13, |
|
Huh alright. Though it seems weird that |
|
I was able to reproduce the error. What's happening is when comparing at::Half with at::Half, the compiler is getting confused with which conversion to do since there are multiple operator overloading (while this is a problem, currently it is avoided by providing half related NVCC flags in Hence, the fix for your extension is when building using setuptools, add the flags in there as shown below. Currently you'll see, PyTorch top of tree adds these flags in |
When building CUDA extensions, we have to pass `extra_compile_args` now to avoid cuda header collisions with half operator overloading (that happens through implicit half conversions) details: pytorch/pytorch#10301 (comment).
Summary: The controller you requested could not be found. found there are some issues when using comparison operators for half types when certain THC header are included. I was able to reproduce and added a test. I also fix the issue by adding the proper definitions to avoid this issue. Reported in #10301 (comment) Related: pytorch/tutorials#292 soumith fmassa Pull Request resolved: #11395 Differential Revision: D9725102 Pulled By: goldsborough fbshipit-source-id: 630425829046bbebea3409bb792a9d62c91f41ad
…rch#10301) Summary: **Summary**: This PR is a followup of mruberry's pytorch#9318. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR pytorch#10147) **Comments**: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check pytorch#8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: ROCm/hip#374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - pytorch#6786 - pytorch#5475 - pytorch#9401 - pytorch#8689 - pytorch#8919 Pull Request resolved: pytorch#10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a
Summary: The controller you requested could not be found. found there are some issues when using comparison operators for half types when certain THC header are included. I was able to reproduce and added a test. I also fix the issue by adding the proper definitions to avoid this issue. Reported in pytorch#10301 (comment) Related: pytorch/tutorials#292 soumith fmassa Pull Request resolved: pytorch#11395 Differential Revision: D9725102 Pulled By: goldsborough fbshipit-source-id: 630425829046bbebea3409bb792a9d62c91f41ad
Summary: This PR is a followup of @mruberry's #9318. It tries to achieve the following:
at::Halftype.CUDANumerics.cuhto contain necessary parts fromTHCNumerics.cuh.THCNumerics.cuhwith new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation.Comments:
CUDANumerics.cuhcontains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacyTHCNumerics.cuh. Following are some rationale behind why some functions were kept while others were removed:CUDAApplyUtils).at::Halfcomparisons rely on implicit conversion to float.at::Half, that defines math function definitions forat::Half. CheckHalf-inl.hpowiis used forpowcalculation on integral types. Moreover,abs,isinf,isnanare specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call.max/minis removed since they call standard defines. Moreover, numeric limits forat::Halfis present inHalf-inl.h. I understood that HIP has some issue withstd::numeric_limitsand this the related github issue I found: std::numeric_limits<_T>::infinity() compilation problem ROCm/hip#374. @AlexVlx mentions that the issue can be avoided by launchingstd::numeric_limitsin__device__. Since, we are launching lambdas with device contexts, I don't see an issue whystd::numeric_limitswon't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP).Here are some reference PRs that was handy in refactoring TH into ATen: