Skip to content

Add python bindings for NCCL CTA policies#164309

Closed
lakshayg wants to merge 2 commits intopytorch:mainfrom
lakshayg:nccl-cta-policy
Closed

Add python bindings for NCCL CTA policies#164309
lakshayg wants to merge 2 commits intopytorch:mainfrom
lakshayg:nccl-cta-policy

Conversation

@lakshayg
Copy link
Collaborator

@lakshayg lakshayg commented Sep 30, 2025

NCCLConfig can now be constructed with non-default cta policies

import torch
from torch.distributed import ProcessGroupNCCL as nccl

config = nccl.NCCLConfig()
config.cta_policy = nccl.NCCL_CTA_POLICY_ZERO  # NCCL version >= 2.28

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci

NCCLConfig can now be constructed with non-default cta policies

```python
import torch
from torch.distributed import ProcessGroupNCCL as nccl

config = nccl.NCCLConfig()
config.cta_policy = nccl.NCCL_CTA_POLICY_ZERO  # NCCL version >= 2.28
```
@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Sep 30, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164309

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit b79e2a4 with merge base 60f0a35 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@eqy eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 30, 2025
Skylion007
Skylion007 previously approved these changes Sep 30, 2025
@Skylion007 Skylion007 dismissed their stale review September 30, 2025 22:50

Whoops, pulled the trigger to quickly.

@lakshayg lakshayg self-assigned this Oct 1, 2025
@lakshayg lakshayg moved this to In Progress in PyTorch + CUDA Oct 1, 2025
processGroupNCCL.def_property_readonly_static(
"NCCL_CTA_POLICY_EFFICIENCY",
[](const py::object&) { return NCCL_CTA_POLICY_EFFICIENCY; });
#ifdef NCCL_CTA_POLICY_ZERO // requires NCCL version >= 2.28
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this defined? I don't see NCCL_CTA_POLICY_ZERO defined anywhere in PyTorch or the NCCL github. Might as well set it conditionally based on NCCL version

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skylion007 Can this comment be resolved or would you prefer I change it to use NCCL version?

@eqy
Copy link
Collaborator

eqy commented Oct 7, 2025

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-project-automation github-project-automation bot moved this from In Progress to Done in PyTorch + CUDA Oct 7, 2025
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
NCCLConfig can now be constructed with non-default [cta policies][1]

```python
import torch
from torch.distributed import ProcessGroupNCCL as nccl

config = nccl.NCCLConfig()
config.cta_policy = nccl.NCCL_CTA_POLICY_ZERO  # NCCL version >= 2.28
```

[1]: https://docs.nvidia.com/deeplearning/nccl/archives/nccl_2283/user-guide/docs/api/flags.html#nccl-communicator-cta-policy-flags

Pull Request resolved: pytorch#164309
Approved by: https://github.com/eqy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (c10d) release notes category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants