Cuda half macros cleanup #10147

syed-ahmed · 2018-08-01T23:17:25Z

Summary: This PR removes couple of macros throughout TH* as part of the re-factoring effort for ATen. Removing these macros should avoid confusion among developers who are trying to move things from TH* to ATen. This PR is part of the THCNumerics deprecation that I have been working on following up on @mruberry's #9318. I am separating these two commits to see if removal of these macros doesn't upset the pytorch public CI, as well as internal builds.

Commit 1248de7 removes the code paths guarded by CUDA_HALF_INSTRUCTIONS macro. Since the macro was removed in commit 2f186df, ifdef CUDA_HALF_INSTRUCTIONS would return false and hence the code path that is kept after this change is for the false case of ifdef CUDA_HALF_INSTRUCTIONS
Commit 520c99b removes the code paths guarded by CUDA_HALF_TENSOR macro. Since Pytorch now provides support for only CUDA 8.0 and above, CUDA_HALF_TENSOR is always true since CUDA 8.0 satisfies CUDA_HAS_FP16 and hence, the code path that is kept after this change is for the true case of ifdef CUDA_HALF_TENSOR.

ezyang

LGTM, but the builds are failing ;)

syed-ahmed · 2018-08-02T17:50:53Z

Looks like I missed some in the torch module. Giving the CI another go at it. :)

syed-ahmed · 2018-08-02T18:06:41Z

@pytorchbot retest this please

…mmit pytorch@2f186df deprecated that macro.

…f CUDA support is 8.0 and above.

yf225 · 2018-08-14T16:29:51Z

@syed-ahmed @ezyang Should we move forward with this patch? The caffe2-py2-cuda9.0-cudnn7-ubuntu16.04-test CI error is probably just a flaky test.

syed-ahmed · 2018-08-14T17:36:25Z

This PR and and #10301 are good to go.

facebook-github-bot

soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This PR removes couple of macros throughout TH* as part of the re-factoring effort for ATen. Removing these macros should avoid confusion among developers who are trying to move things from TH* to ATen. This PR is part of the THCNumerics deprecation that I have been working on following up on mruberry's pytorch/pytorch#9318. I am separating these two commits to see if removal of these macros doesn't upset the pytorch public CI, as well as internal builds. - Commit pytorch/pytorch@1248de7 removes the code paths guarded by `CUDA_HALF_INSTRUCTIONS` macro. Since the macro was removed in commit pytorch/pytorch@2f186df, `ifdef CUDA_HALF_INSTRUCTIONS` would return false and hence the code path that is kept after this change is for the false case of `ifdef CUDA_HALF_INSTRUCTIONS` - Commit pytorch/pytorch@520c99b removes the code paths guarded by `CUDA_HALF_TENSOR` macro. Since Pytorch now provides support for only CUDA 8.0 and above, `CUDA_HALF_TENSOR` is always true since CUDA 8.0 satisfies `CUDA_HAS_FP16` and hence, the code path that is kept after this change is for the true case of `ifdef CUDA_HALF_TENSOR`. Pull Request resolved: pytorch/pytorch#10147 Differential Revision: D9345940 Pulled By: soumith fbshipit-source-id: c9392261dd432d304f1cdaf961760cbd164a59d0

Summary: **Summary**: This PR is a followup of mruberry's #9318. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR #10147) **Comments**: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check #8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: ROCm/hip#374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - #6786 - #5475 - #9401 - #8689 - #8919 Pull Request resolved: #10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a

Summary: **Summary**: This PR is a followup of mruberry's pytorch/pytorch#9318. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR pytorch/pytorch#10147) **Comments**: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check pytorch/pytorch#8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: ROCm/hip#374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - pytorch/pytorch#6786 - pytorch/pytorch#5475 - pytorch/pytorch#9401 - pytorch/pytorch#8689 - pytorch/pytorch#8919 Pull Request resolved: pytorch/pytorch#10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a

…rch#10301) Summary: **Summary**: This PR is a followup of mruberry's pytorch#9318. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR pytorch#10147) **Comments**: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check pytorch#8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: ROCm/hip#374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - pytorch#6786 - pytorch#5475 - pytorch#9401 - pytorch#8689 - pytorch#8919 Pull Request resolved: pytorch#10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a

This check was introduced by #5417 and then turned into a tautology by #10147 So I guess it's time to let go of all that dynamic initialization (and may be just delete it in 2.3?) Pull Request resolved: #115884 Approved by: https://github.com/kit1980

This check was introduced by pytorch#5417 and then turned into a tautology by pytorch#10147 So I guess it's time to let go of all that dynamic initialization (and may be just delete it in 2.3?) Pull Request resolved: pytorch#115884 Approved by: https://github.com/kit1980

syed-ahmed requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners August 1, 2018 23:17

ezyang approved these changes Aug 2, 2018

View reviewed changes

syed-ahmed mentioned this pull request Aug 7, 2018

Refactor THCNumerics and add common math functions for at::Half #10301

Closed

syed-ahmed added 3 commits August 10, 2018 16:20

Remove code paths guarded by CUDA_HALF_INSTRUCTIONS macro, because co…

00400a8

…mmit pytorch@2f186df deprecated that macro.

Remove code paths guarded by CUDA_HALF_TENSOR since minimum version o…

fe7a53d

…f CUDA support is 8.0 and above.

Refactor for CUDA_HALF_TENSOR in the torch module

3708e44

syed-ahmed force-pushed the cuda-half-macros-cleanup branch from d1a8c44 to 3708e44 Compare August 10, 2018 23:22

Delete FusedRNNKernel.cu

c990216

facebook-github-bot reviewed Aug 15, 2018

View reviewed changes

facebook-github-bot closed this in 5adcac3 Aug 15, 2018

syed-ahmed deleted the cuda-half-macros-cleanup branch August 16, 2018 00:08

ezyang added open source merged labels Jun 24, 2019

malfet mentioned this pull request Dec 15, 2023

[BE] Set torch.cuda.has_half to True #115884

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cuda half macros cleanup #10147

Cuda half macros cleanup #10147

Uh oh!

syed-ahmed commented Aug 1, 2018 •

edited

Loading

Uh oh!

ezyang left a comment

Uh oh!

syed-ahmed commented Aug 2, 2018

Uh oh!

syed-ahmed commented Aug 2, 2018

Uh oh!

yf225 commented Aug 14, 2018

Uh oh!

syed-ahmed commented Aug 14, 2018

Uh oh!

facebook-github-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cuda half macros cleanup #10147

Cuda half macros cleanup #10147

Uh oh!

Conversation

syed-ahmed commented Aug 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

syed-ahmed commented Aug 2, 2018

Uh oh!

syed-ahmed commented Aug 2, 2018

Uh oh!

yf225 commented Aug 14, 2018

Uh oh!

syed-ahmed commented Aug 14, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

syed-ahmed commented Aug 1, 2018 •

edited

Loading