Extend DispatchStub to support CUDA dispatch #9579

colesbury · 2018-07-19T01:46:45Z

This is a few files taken from #8919. They're unchanged from the latest versions of that PR.

This is part of https://github.com/pytorch/pytorch/pull/8919. It's
separated to make it easier to merge the PR in pieces.

There are a few major changes to DispatchStub

 - The environment variable ATEN_CPU_CAPABILITY overrides the CPU
   capability detection code (Previous ATEN_DISABLE_AVX/AVX2)

 - DispatchStub is defined in the generic native code instead of the
   CPU_CAPABILITY_DEFAULT kernel.

This is part of pytorch#8919. It's separated to make it easier to merge the PR in pieces. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel.

facebook-github-bot

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

cpuhrsch · 2018-07-19T18:03:40Z

If you're changing the capability variable name, please also edit ".jenkins/pytorch/test.sh" accordingly.

Summary: This is a few files taken from pytorch/pytorch#8919. They're unchanged from the latest versions of that PR. ``` This is part of pytorch/pytorch#8919. It's separated to make it easier to merge the PR in pieces. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch/pytorch#9579 Differential Revision: D8909000 Pulled By: colesbury fbshipit-source-id: fdeb606270b06acdab3c01dba97ec9d81584ecc0

This reverts commit bcf0bf4.

Summary: This reverts commit bcf0bf4. The commit was causing issues for some internal FB projects. Pull Request resolved: #9614 Reviewed By: Yangqing Differential Revision: D8929552 Pulled By: colesbury fbshipit-source-id: ae9026ad8762a4c5de401273694b4c878fc241a6

This is a modification of the strategy from pytorch#8919 and pytorch#9579. Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel.

Summary: This is a modification of the strategy from #8919 and #9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: #9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef

Summary: This is a modification of the strategy from pytorch/pytorch#8919 and pytorch/pytorch#9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch/pytorch#9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef

Summary: This is a few files taken from pytorch#8919. They're unchanged from the latest versions of that PR. ``` This is part of pytorch#8919. It's separated to make it easier to merge the PR in pieces. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch#9579 Differential Revision: D8909000 Pulled By: colesbury fbshipit-source-id: fdeb606270b06acdab3c01dba97ec9d81584ecc0

…ytorch#9614) Summary: This reverts commit bcf0bf4. The commit was causing issues for some internal FB projects. Pull Request resolved: pytorch#9614 Reviewed By: Yangqing Differential Revision: D8929552 Pulled By: colesbury fbshipit-source-id: ae9026ad8762a4c5de401273694b4c878fc241a6

Summary: This is a modification of the strategy from pytorch#8919 and pytorch#9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch#9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef

Summary: This is a few files taken from pytorch#8919. They're unchanged from the latest versions of that PR. ``` This is part of pytorch#8919. It's separated to make it easier to merge the PR in pieces. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch#9579 Differential Revision: D8909000 Pulled By: colesbury fbshipit-source-id: fdeb606270b06acdab3c01dba97ec9d81584ecc0

…ytorch#9614) Summary: This reverts commit bcf0bf4. The commit was causing issues for some internal FB projects. Pull Request resolved: pytorch#9614 Reviewed By: Yangqing Differential Revision: D8929552 Pulled By: colesbury fbshipit-source-id: ae9026ad8762a4c5de401273694b4c878fc241a6

Summary: This is a modification of the strategy from pytorch#8919 and pytorch#9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch#9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef

…code to disable warnings for ROCm builds (#1642) Fixes pytorch#9579

colesbury requested review from apaszke, ezyang, gchanan, soumith and zdevito as code owners July 19, 2018 01:46

facebook-github-bot reviewed Jul 19, 2018

View reviewed changes

gchanan approved these changes Jul 19, 2018

View reviewed changes

colesbury mentioned this pull request Jul 19, 2018

Explicitly unset AVX/AVX2 env variables #9574

Closed

facebook-github-bot reviewed Jul 19, 2018

View reviewed changes

colesbury added 2 commits July 19, 2018 11:00

Remove unused import

41e699f

Set ATEN_CPU_CAPABILITY instead of ATEN_DISABLE_AVX/AVX2

2e2752b

facebook-github-bot reviewed Jul 19, 2018

View reviewed changes

facebook-github-bot closed this in bcf0bf4 Jul 19, 2018

colesbury added a commit to colesbury/pytorch that referenced this pull request Jul 20, 2018

Revert "Extend DispatchStub to support CUDA dispatch (pytorch#9579)"

cd0c10b

This reverts commit bcf0bf4.

colesbury mentioned this pull request Jul 20, 2018

Extend DispatchStub to support CUDA dispatch #9664

Closed

ezyang added the merged label Jun 26, 2019

iupaikov-amd mentioned this pull request Oct 21, 2024

[rocm6.3_internal_testing] [NO CP] Added ignore to cuda driver check code to disable warnings for ROCm builds ROCm/pytorch#1642

Merged

pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request Oct 28, 2024

[rocm6.3_internal_testing] [NO CP] Added ignore to cuda driver check …

eb036a4

…code to disable warnings for ROCm builds (#1642) Fixes pytorch#9579

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend DispatchStub to support CUDA dispatch #9579

Extend DispatchStub to support CUDA dispatch #9579

Uh oh!

colesbury commented Jul 19, 2018

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

cpuhrsch commented Jul 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Extend DispatchStub to support CUDA dispatch #9579

Extend DispatchStub to support CUDA dispatch #9579

Uh oh!

Conversation

colesbury commented Jul 19, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

cpuhrsch commented Jul 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants