Skip to content

Introduce missing collectives and small fixes to support local tensor mode in AutoParallel#168110

Closed
dzmitry-huba wants to merge 2 commits intogh/dzmitry-huba/13/basefrom
gh/dzmitry-huba/13/head
Closed

Introduce missing collectives and small fixes to support local tensor mode in AutoParallel#168110
dzmitry-huba wants to merge 2 commits intogh/dzmitry-huba/13/basefrom
gh/dzmitry-huba/13/head

Conversation

@dzmitry-huba
Copy link
Contributor

@dzmitry-huba dzmitry-huba commented Nov 18, 2025

This PR introduces support for additional functional collectives used in AutoParallel.

Another change is in the semantic of the tolist() on the LocalTensor. Previously LocalTensor would reconcile first and then return a single tensor that is same on all ranks. AutoParallel uses tolist() to compute all-to-all splits during token dispatch and combine.

Stack from ghstack (oldest at bottom):

@pytorch-bot pytorch-bot bot added the release notes: distributed (c10d) release notes category label Nov 18, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/168110

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c27928b with merge base fb6af11 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

dzmitry-huba added a commit that referenced this pull request Nov 18, 2025
… mode in AutoParallel

ghstack-source-id: 5fd8ff5
Pull Request resolved: #168110
@dzmitry-huba dzmitry-huba marked this pull request as ready for review November 18, 2025 23:15


_LOCAL_TENSOR_MODE: list["LocalTensorMode"] = []
_GLOBAL_LOCAL_TENSOR_MODE: list["LocalTensorMode"] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err, so what's the global mode lol

…ocal tensor mode in AutoParallel"

This PR introduces support for additional functional collectives used in AutoParallel. 

Another change is in the semantic of the tolist() on the LocalTensor. Previously LocalTensor would reconcile first and then return a single tensor that is same on  all ranks. AutoParallel uses tolist() to compute all-to-all splits during token dispatch and combine.




[ghstack-poisoned]
dzmitry-huba added a commit that referenced this pull request Nov 19, 2025
… mode in AutoParallel

ghstack-source-id: 4b51e84
Pull Request resolved: #168110
@dzmitry-huba
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 19, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@dzmitry-huba
Copy link
Contributor Author

@pytorchbot help

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 19, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'help' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

Try @pytorchbot --help for more info.

@dzmitry-huba
Copy link
Contributor Author

@pytorchbot --help

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 19, 2025

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. Some commands may be used on issues as specified below.

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label,drci,cherry-pick}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR
    drci                Update Dr. CI
    cherry-pick         Cherry pick a PR onto a release branch

Merge

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
                        
                        Please use `-f` as last resort, prefer `--ignore-current` to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.
  -i, --ignore-current  Merge while ignoring the currently failing jobs.  Behaves like -f if there are no pending jobs.
  -ic                   Old flag for --ignore-current. Deprecated in favor of -i.
  -r [{viable/strict,main}], --rebase [{viable/strict,main}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or main as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
Repeat contributor may use this command to rebase their PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR or Issue [Can be used on Issues]

positional arguments:
  labels  Labels to add to given Pull Request or Issue [Can be used on Issues]

Dr CI

usage: @pytorchbot drci 

Update Dr. CI. Updates the Dr. CI comment on the PR in case it's gotten out of sync with actual CI results.

cherry-pick

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Cherry pick a pull request onto a release branch for inclusion in a release

optional arguments:
  --onto ONTO, --into ONTO
                        Branch you would like to cherry pick onto (Example: release/2.1)
  --fixes FIXES         Link to the issue that your PR fixes (Example: https://github.com/pytorch/pytorch/issues/110666)
  -c {regression,critical,fixnewfeature,docs,release}, --classification {regression,critical,fixnewfeature,docs,release}
                        A machine-friendly classification of the cherry-pick reason.

@pytorchmergebot
Copy link
Collaborator

This PR (#168110) was merged in 6c02dde but it is still open, likely due to a Github bug, so mergebot is closing it manually. If you think this is a mistake, please feel free to reopen and contact Dev Infra.

XueningXu pushed a commit to XueningXu/pytorch that referenced this pull request Nov 19, 2025
… mode in AutoParallel (pytorch#168110)

This PR introduces support for additional functional collectives used in AutoParallel.

Another change is in the semantic of the tolist() on the LocalTensor. Previously LocalTensor would reconcile first and then return a single tensor that is same on  all ranks. AutoParallel uses tolist() to compute all-to-all splits during token dispatch and combine.

Pull Request resolved: pytorch#168110
Approved by: https://github.com/ezyang
@github-actions github-actions bot deleted the gh/dzmitry-huba/13/head branch December 20, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants