Test Copy Engine All-to-all#170344
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/170344
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit e212bdf with merge base 8c5e14f ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
test/distributed/test_ce_colls.py
Outdated
| # if self.rank == 0: | ||
| # prof.export_chrome_trace("test_ce_alltoall.json") |
There was a problem hiding this comment.
It's left here on purpose - when dump of trace is needed from this test :)
There was a problem hiding this comment.
No I don't think leave a comment like this makes sense.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -f "pr_time_benchmark failure is unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
NCCL 2.28 added Copy Engine (CE) support. Condition: - Tensors be symmetrically registered (e.g. coming from symm_mem.empty) - NCCL_CTA_POLICY_ZERO be passed to ncclConfig or env var NCCL_CTA_POLICY=2 Confirmed use of CE via profile: <img width="612" height="167" alt="Screenshot 2025-12-12 at 2 44 23 PM" src="https://github.com/user-attachments/assets/5efb6e9c-40a4-43a0-878f-36733b8b64dd" /> (First kernel is from `all_to_all_single` on regular tensor, second kernel is from `all_to_all_single` on tensors that have been window registered) Caveat: As of 2.28.9, CE collectives cannot be run on default stream, so we are testing it with `async_op=True` or with a side stream. Pull Request resolved: pytorch#170344 Approved by: https://github.com/fduwjj
Stack from ghstack (oldest at bottom):
NCCL 2.28 added Copy Engine (CE) support.
Condition:
Confirmed use of CE via profile:

(First kernel is from
all_to_all_singleon regular tensor, second kernel is fromall_to_all_singleon tensors that have been window registered)Caveat:
As of 2.28.9, CE collectives cannot be run on default stream, so we are testing it with
async_op=Trueor with a side stream.