Add option to FakeProcessGroup to raise error if comms are invoked.#162841
Add option to FakeProcessGroup to raise error if comms are invoked.#162841ezyang wants to merge 3 commits intogh/ezyang/3151/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162841
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 36f0bab with merge base d633bac ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
The current behavior is to do "nothing", which means you will corrupt data. If you're doing something similar to LocalTensor, where you're overriding the behavior of collectives to do something numerically, this can be unwelcome behavior. If you can error when this happens it can help prevent silent numerical incorrectness. Authored with claude code. Signed-off-by: Edward Yang <ezyang@meta.com> ghstack-source-id: 9dff853 Pull-Request: #162841
The current behavior is to do "nothing", which means you will corrupt data. If you're doing something similar to LocalTensor, where you're overriding the behavior of collectives to do something numerically, this can be unwelcome behavior. If you can error when this happens it can help prevent silent numerical incorrectness. Authored with claude code. Signed-off-by: Edward Yang <ezyang@meta.com> ghstack-source-id: 3d4910b Pull-Request: #162841
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
The current behavior is to do "nothing", which means you will corrupt data. If you're doing something similar to LocalTensor, where you're overriding the behavior of collectives to do something numerically, this can be unwelcome behavior. If you can error when this happens it can help prevent silent numerical incorrectness. Authored with claude code. Signed-off-by: Edward Yang <ezyang@meta.com> ghstack-source-id: 98cddeb Pull-Request: #162841
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.gfx942.4) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.gfx942.4) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ytorch#162841) The current behavior is to do "nothing", which means you will corrupt data. If you're doing something similar to LocalTensor, where you're overriding the behavior of collectives to do something numerically, this can be unwelcome behavior. If you can error when this happens it can help prevent silent numerical incorrectness. Authored with claude code. Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: pytorch#162841 Approved by: https://github.com/dcci
Stack from ghstack (oldest at bottom):
The current behavior is to do "nothing", which means you will corrupt
data. If you're doing something similar to LocalTensor, where you're
overriding the behavior of collectives to do something numerically,
this can be unwelcome behavior. If you can error when this happens
it can help prevent silent numerical incorrectness.
Authored with claude code.
Signed-off-by: Edward Yang ezyang@meta.com
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci