Skip to content

[CMake] Fix USE_FBGEMM_GENAI option#164165

Closed
malfet wants to merge 2 commits intogh/malfet/542/basefrom
gh/malfet/542/head
Closed

[CMake] Fix USE_FBGEMM_GENAI option#164165
malfet wants to merge 2 commits intogh/malfet/542/basefrom
gh/malfet/542/head

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Sep 29, 2025

Stack from ghstack (oldest at bottom):


  • cmake_dependent_option condition should be USE_ROCM OR (USE_CUDA AND NOT MSVC) (similar to the one for flash attention)
  • Default settings should be user overridable, i.e. even if one builds for SM_10, they should be able to pass USE_FBGEMM_GENAI=0 and skip the build

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164165

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 58c1c07 with merge base 069ccf5 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet added a commit that referenced this pull request Sep 29, 2025
- `cmake_dependent_option` condition should be `USE_ROCM OR (USE_CUDA AND NOT MSVC)` (similar to the one for flash attention)
- Default settings should be user overridable, i.e. even if one builds for SM_10, they should be able to pass `USE_FBGEMM_GENAI=0` and skip the build

ghstack-source-id: e741c64
Pull Request resolved: #164165
@malfet malfet added release notes: build release notes category topic: bug fixes topic category labels Sep 29, 2025
Copy link
Collaborator

@Skylion007 Skylion007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I'm confused by we are removing useful debugging print info from the Summary CMake

message(STATUS " USE_EIGEN_FOR_BLAS : ${CAFFE2_USE_EIGEN_FOR_BLAS}")
message(STATUS " USE_EIGEN_FOR_SPARSE : ${USE_EIGEN_SPARSE}")
message(STATUS " USE_FBGEMM : ${USE_FBGEMM}")
message(STATUS " USE_FBGEMM_GENAI : ${USE_FBGEMM_GENAI}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the printing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding the print statement, don't I?

IF(USE_ROCM AND "gfx942" IN_LIST PYTORCH_ROCM_ARCH)
message(WARNING "Setting USE_FBGEMM_GENAI for gfx942 to ON by default, doing ROCM build")
set(USE_FBGEMM_GENAI_DEFAULT ON)
elseif(USE_CUDA AND "$ENV{TORCH_CUDA_ARCH_LIST}" MATCHES "10.0" AND CMAKE_CUDA_COMPILER_VERSION VERSION_GREATER_EQUAL 12.8 AND NOT WIN32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for datacenter Blackwell? Do we not need it for 11.0 or 12.0 arches too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is insane to be frank, and yes, it should probably be enabled for more GPU architectures, but at least it makes the logic user-overridable, so one can build for SM_12 if they choose to

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is going to be a footgun for consumer blackwell soon though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it'll work on consumer Blackwells, as it compiles for sm_10a, which is not directly runnable/translatable for sm_12

Copy link
Contributor

@danielvegamyhre danielvegamyhre Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mxfp8 grouped gemm in fbgemm must be built with sm100a and cuda 12.8+, which is why this check exists the way it does. Furthermore, torch nightly builds only set TORCH_CUDA_ARCH_LIST=10.0, never 10.0a due to lack of portability, so we have to add sm100a to the build targets for fbgemm.

I am not sure if sm120 supports the tcgen* PTX instructions needed for the mxfp8 grouped gemm kernel?

@malfet malfet added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 29, 2025
[ghstack-poisoned]
malfet added a commit that referenced this pull request Sep 29, 2025
- `cmake_dependent_option` condition should be `USE_ROCM OR (USE_CUDA AND NOT MSVC)` (similar to the one for flash attention)
- Default settings should be user overridable, i.e. even if one builds for SM_10, they should be able to pass `USE_FBGEMM_GENAI=0` and skip the build

ghstack-source-id: fe63e9b
Pull Request resolved: #164165
@malfet
Copy link
Contributor Author

malfet commented Sep 29, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/malfet/542/head branch October 31, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: build release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants