Skip to content

[cuBLAS] update cuBLAS determinism docs, remove workspace requirement checks#161749

Closed
eqy wants to merge 4 commits intopytorch:mainfrom
eqy:cublasnowdeterministic
Closed

[cuBLAS] update cuBLAS determinism docs, remove workspace requirement checks#161749
eqy wants to merge 4 commits intopytorch:mainfrom
eqy:cublasnowdeterministic

Conversation

@eqy
Copy link
Collaborator

@eqy eqy commented Aug 28, 2025

Since CUDA 11.x (need to update the docs for this, current PR is saying 12.2 which is incorrect) we've been allocating cuBLAS workspaces explicitly per handle/stream combination #85447

According to the cuBLAS documentation, this appears to be sufficient for determinism without any explicit workspace requirements to e.g., :4096:8 or :16:8 as was previously expressed in PyTorch docs https://docs.nvidia.com/cuda/cublas/#results-reproducibility

Planning to add an explicit determinism test as well...

cc @ptrblck @msaroufim @jerryzh168 @csarofeen @xwang233 @mruberry @kurtamohler

@eqy eqy requested a review from syed-ahmed as a code owner August 28, 2025 23:49
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161749

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 896198b with merge base ac7b4e7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@eqy eqy added module: cuda Related to torch.cuda, and CUDA support in general module: cublas Problem related to cublas support module: determinism open source release notes: cuda release notes category ciflow/trunk Trigger trunk jobs on your pull request ciflow/h100 labels Aug 28, 2025
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 3, 2025
@zou3519 zou3519 requested a review from ngimel September 3, 2025 16:31
[ 0.0333, -1.1444]]], device='cuda:0')

Furthermore, if you are using CUDA tensors, and your CUDA version is 10.2 or greater, you
Furthermore, if you are using CUDA tensors, and your CUDA version is between 10.2 and 11.0 you
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer supported, just remove this sentence

@ngimel ngimel mentioned this pull request Sep 4, 2025
@ngimel
Copy link
Collaborator

ngimel commented Sep 10, 2025

Can this one be landed?

@eqy
Copy link
Collaborator Author

eqy commented Sep 10, 2025

Sure, let's see CI signal after removing the old determinisic alert test

@ngimel
Copy link
Collaborator

ngimel commented Sep 11, 2025

Now there's a dtype error in _scaled_mm that shouldn't be related?

@eqy
Copy link
Collaborator Author

eqy commented Sep 15, 2025

H100 _scaled_mm failure should be addressed by #162022
I think we're seeing it because I manually opted into ciflow/H100 here

@ngimel
Copy link
Collaborator

ngimel commented Sep 18, 2025

ciflow/H100 is still run on trunk (see on HUD), if it doesn't report existing failures that's a problem (and looks like it doesn't).

@eqy eqy force-pushed the cublasnowdeterministic branch from 3f484ec to 896198b Compare October 2, 2025 18:42
@eqy eqy requested a review from Aidyn-A as a code owner October 2, 2025 18:42
@eqy
Copy link
Collaborator Author

eqy commented Oct 2, 2025

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
… checks (pytorch#161749)

Since CUDA 11.x (need to update the docs for this, current PR is saying 12.2 which is incorrect) we've been allocating cuBLAS workspaces explicitly per handle/stream combination pytorch#85447

According to the cuBLAS documentation, this appears to be sufficient for determinism without any explicit workspace requirements to e.g., `:4096:8` or `:16:8` as was previously expressed in PyTorch docs https://docs.nvidia.com/cuda/cublas/#results-reproducibility

Planning to add an explicit determinism test as well...

Pull Request resolved: pytorch#161749
Approved by: https://github.com/ngimel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/h100 ciflow/trunk Trigger trunk jobs on your pull request Merged module: cublas Problem related to cublas support module: cuda Related to torch.cuda, and CUDA support in general module: determinism open source release notes: cuda release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants