[CMake] Fix USE_FBGEMM_GENAI option#164165
[CMake] Fix USE_FBGEMM_GENAI option#164165malfet wants to merge 2 commits intogh/malfet/542/basefrom
USE_FBGEMM_GENAI option#164165Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164165
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 58c1c07 with merge base 069ccf5 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
- `cmake_dependent_option` condition should be `USE_ROCM OR (USE_CUDA AND NOT MSVC)` (similar to the one for flash attention) - Default settings should be user overridable, i.e. even if one builds for SM_10, they should be able to pass `USE_FBGEMM_GENAI=0` and skip the build ghstack-source-id: e741c64 Pull Request resolved: #164165
Skylion007
left a comment
There was a problem hiding this comment.
Looks good, but I'm confused by we are removing useful debugging print info from the Summary CMake
| message(STATUS " USE_EIGEN_FOR_BLAS : ${CAFFE2_USE_EIGEN_FOR_BLAS}") | ||
| message(STATUS " USE_EIGEN_FOR_SPARSE : ${USE_EIGEN_SPARSE}") | ||
| message(STATUS " USE_FBGEMM : ${USE_FBGEMM}") | ||
| message(STATUS " USE_FBGEMM_GENAI : ${USE_FBGEMM_GENAI}") |
There was a problem hiding this comment.
Why remove the printing here?
There was a problem hiding this comment.
I'm adding the print statement, don't I?
| IF(USE_ROCM AND "gfx942" IN_LIST PYTORCH_ROCM_ARCH) | ||
| message(WARNING "Setting USE_FBGEMM_GENAI for gfx942 to ON by default, doing ROCM build") | ||
| set(USE_FBGEMM_GENAI_DEFAULT ON) | ||
| elseif(USE_CUDA AND "$ENV{TORCH_CUDA_ARCH_LIST}" MATCHES "10.0" AND CMAKE_CUDA_COMPILER_VERSION VERSION_GREATER_EQUAL 12.8 AND NOT WIN32) |
There was a problem hiding this comment.
Only for datacenter Blackwell? Do we not need it for 11.0 or 12.0 arches too?
There was a problem hiding this comment.
This check is insane to be frank, and yes, it should probably be enabled for more GPU architectures, but at least it makes the logic user-overridable, so one can build for SM_12 if they choose to
There was a problem hiding this comment.
Yeah, this is going to be a footgun for consumer blackwell soon though
There was a problem hiding this comment.
I'm not sure if it'll work on consumer Blackwells, as it compiles for sm_10a, which is not directly runnable/translatable for sm_12
There was a problem hiding this comment.
The mxfp8 grouped gemm in fbgemm must be built with sm100a and cuda 12.8+, which is why this check exists the way it does. Furthermore, torch nightly builds only set TORCH_CUDA_ARCH_LIST=10.0, never 10.0a due to lack of portability, so we have to add sm100a to the build targets for fbgemm.
I am not sure if sm120 supports the tcgen* PTX instructions needed for the mxfp8 grouped gemm kernel?
- `cmake_dependent_option` condition should be `USE_ROCM OR (USE_CUDA AND NOT MSVC)` (similar to the one for flash attention) - Default settings should be user overridable, i.e. even if one builds for SM_10, they should be able to pass `USE_FBGEMM_GENAI=0` and skip the build ghstack-source-id: fe63e9b Pull Request resolved: #164165
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
USE_FBGEMM_GENAIoption #164165cmake_dependent_optioncondition should beUSE_ROCM OR (USE_CUDA AND NOT MSVC)(similar to the one for flash attention)USE_FBGEMM_GENAI=0and skip the build