Skip to content

Use cuda nvrtc so file based on cuda version used by torch#163642

Closed
atalman wants to merge 3 commits intopytorch:mainfrom
atalman:fix_nvrtc_version_loading
Closed

Use cuda nvrtc so file based on cuda version used by torch#163642
atalman wants to merge 3 commits intopytorch:mainfrom
atalman:fix_nvrtc_version_loading

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Sep 23, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163642

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 26e0629 with merge base 1a42656 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@msaroufim msaroufim added the release notes: cuda release notes category label Sep 23, 2025


def _get_nvrtc_library() -> ctypes.CDLL:
major_version = int(torch.version.cuda.split(".")[0]) # type: ignore[union-attr]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use a torch.utils._typing_utils not_none here instead of a type ignore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this, Looks like torch.utils._typing_utils not_none creates circular dependency issue.

return ctypes.CDLL(lib_name)
except OSError:
continue
raise OSError("Could not find any NVRTC library")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not store the last OSError in an optional and chain from there to preserve the trace?

for lib_name in nvrtc_libs:
try:
return ctypes.CDLL(lib_name)
except OSError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should at least be a logger.debug with exc_info=True

@atalman
Copy link
Contributor Author

atalman commented Sep 24, 2025

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 24, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor Author

atalman commented Sep 24, 2025

@pytorchbot cherry-pick --onto release/2.9 --fixes "Critical CI fix" -c critical

@pytorchbot
Copy link
Collaborator

Cherry picking #163642

Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 9d0d98acfe7e3051046ec5854e8b106f2aedd6c2 returned non-zero exit code 1

Auto-merging torch/cuda/_utils.py
CONFLICT (content): Merge conflict in torch/cuda/_utils.py
error: could not apply 9d0d98acfe7... Use cuda nvrtc so file based on cuda version used by torch (#163642)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: cuda release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

libnvrtc.so failure in TestCompileKernel.test_compile_kernel

5 participants