Removing unnecessary wait_tensor interception in LocalTensor by dzmitry-huba · Pull Request #169734 · pytorch/pytorch

dzmitry-huba · 2025-12-06T00:40:42Z

Stack from ghstack (oldest at bottom):

-> Removing unnecessary wait_tensor interception in LocalTensor #169734

Base implementation for wait_tensor pops work from registry, waits
on it and then returns passed in object. Not draining registry
(as current implementation incorrectly does) results in leaking
work objects and may result in incorrect outcome when using LocalRunner.

Base implementation for wait_tensor pops work from registry, waits on it and then returns passed in object. Not draining registry (as current implementation incorrectly does) results in leaking work objects and may result in incorrect outcome when using LocalRunner. [ghstack-poisoned]

pytorch-bot · 2025-12-06T00:40:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169734

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1ed7463 with merge base fbe9a5b ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4) (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Base implementation for wait_tensor pops work from registry, waits on it and then returns passed in object. Not draining registry (as current implementation incorrectly does) results in leaking work objects and may result in incorrect outcome when using LocalRunner. ghstack-source-id: 9d21e77 Pull Request resolved: #169734

wconstab · 2025-12-06T02:08:07Z

torch/distributed/_local_tensor/__init__.py

                return _c10d._local_functional_reduce_scatter_tensor(*args, **kwargs)
            elif func is torch.ops._c10d_functional.all_to_all_single.default:
                return _c10d._local_functional_all_to_all_single(*args, **kwargs)
-            elif func is torch.ops._c10d_functional.wait_tensor.default:


is the explanation of what went wrong here that we intercepted the wait_tensor ops, but lost the part of their implementation that interacted with the work registry?

follow up question- if we are doing localtensor stuff, we are not doing any actual collectives in the first place, we shouldn't have actual work objects registered to a process group right? should we stop pushing wait objects into the global registry when they are not genuine waits? Maybe this does not matter, i'm not sure.

#1 Correct, we lost part that interacted with WorkRegistry.

#2 While we are intercepting some functional collectives, we do not intercept all (for example, those that need output allocation based on the inputs that are under local tensor maybe specific per rank). Others, maybe implemented in terms of standard collectives - these implementations take returned Work object (even if it is fake) and register it with the registry. Hence the waits may be fake, but work registry is still non-empty.

dzmitry-huba · 2025-12-06T02:35:55Z

@pytorchbot merge

pytorchmergebot · 2025-12-06T02:37:47Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…#169734) Base implementation for wait_tensor pops work from registry, waits on it and then returns passed in object. Not draining registry (as current implementation incorrectly does) results in leaking work objects and may result in incorrect outcome when using LocalRunner. Pull Request resolved: pytorch#169734 Approved by: https://github.com/dolpm

Base implementation for wait_tensor pops work from registry, waits on it and then returns passed in object. Not draining registry (as current implementation incorrectly does) results in leaking work objects and may result in incorrect outcome when using LocalRunner. Pull Request resolved: #169734 Approved by: https://github.com/dolpm

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Dec 6, 2025

dzmitry-huba requested review from dolpm and wconstab December 6, 2025 00:40

dolpm approved these changes Dec 6, 2025

View reviewed changes

wconstab reviewed Dec 6, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2025

pytorchmergebot added the merging label Dec 6, 2025

pytorchmergebot added the Merged label Dec 6, 2025

pytorchmergebot closed this in 5629067 Dec 6, 2025

pytorchmergebot removed the merging label Dec 6, 2025

github-actions bot deleted the gh/dzmitry-huba/18/head branch January 6, 2026 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing unnecessary wait_tensor interception in LocalTensor#169734

Removing unnecessary wait_tensor interception in LocalTensor#169734
dzmitry-huba wants to merge 1 commit intogh/dzmitry-huba/18/basefrom
gh/dzmitry-huba/18/head

dzmitry-huba commented Dec 6, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 6, 2025 •

edited

Loading

Uh oh!

wconstab Dec 6, 2025

Uh oh!

dzmitry-huba Dec 6, 2025

Uh oh!

dzmitry-huba commented Dec 6, 2025

Uh oh!

pytorchmergebot commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dzmitry-huba commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169734

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

wconstab Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

dzmitry-huba Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

dzmitry-huba commented Dec 6, 2025

Uh oh!

pytorchmergebot commented Dec 6, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dzmitry-huba commented Dec 6, 2025 •

edited

Loading

pytorch-bot bot commented Dec 6, 2025 •

edited

Loading