DTensor: add more foreach ops to supported sharding prop list #132066

bdhirsh · 2024-07-29T17:40:22Z

fixes #132016.

Right now if you run an op that DTensor has no sharding prop rule, and that op accepts non-trivial pytrees of inputs tensors as arguments, DTensor can end up infinite looping before it has the chance to error due to not having a sharding prop rule.

This PR doesn't fix the problem, but adds rules for the culprit ops (missing foreach ops)

Stack from ghstack (oldest at bottom):

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

[ghstack-poisoned]

pytorch-bot · 2024-07-29T17:40:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132066

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (8 Unrelated Failures)

As of commit 5036dd9 with merge base a356a03 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, amz2023.linux.2xlarge) (gh) (trunk failure)
test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_subclass_priority
pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3, amz2023.linux.2xlarge) (gh) (trunk failure)
functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input
pull / linux-focal-py3.12-clang10 / test (dynamo, 1, 3, amz2023.linux.2xlarge) (gh) (trunk failure)
test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_subclass_priority
pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3, amz2023.linux.2xlarge) (gh) (trunk failure)
functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input
pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 2, 3, amz2023.linux.2xlarge) (gh) (trunk failure)
test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_subclass_priority
pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 3, 3, amz2023.linux.2xlarge) (gh) (trunk failure)
functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input
pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 3, amz2023.linux.2xlarge) (gh) (trunk failure)
test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_subclass_priority
pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 3, amz2023.linux.2xlarge) (gh) (trunk failure)
functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input

This comment was automatically generated by Dr. CI and updates every 15 minutes.

bdhirsh · 2024-07-29T17:42:28Z

torch/distributed/_tensor/_dispatch.py

            else:
                kwargs_schema[k] = v
                local_kwargs[k] = v
-


before landing this, I probably need to:

(1) try adding a sharding prop rule for aten._foreach_mul
(2) see if that fixes the repro and add tests for it

I think this partially fixes #132016. It doesn't fully fix it though because: (1) we don't have a sharding prop rule for `aten. _foreach_mul_` (2) when we don't have a sharding prop rule, we assume that the schema of our op just has a flat list of inputs that does not require any nested unflattening cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

wz337 · 2024-07-29T17:55:42Z

torch/distributed/_tensor/ops/_pointwise_ops.py

    aten._foreach_addcmul_.Tensor,
    aten._foreach_clamp_max_.Scalar,
    aten._foreach_clamp_min_.Scalar,
    aten._foreach_div_.List,


Should we also add aten. foreach_div.Scalar and aten. _foreach_div.Scalar in this PR as well?

yep good call

wanchaol · 2024-07-29T18:15:29Z

torch/distributed/_tensor/_dispatch.py

                args_schema.append(arg)
                local_args.append(arg)

+        tree_map(arg_to_spec, args)


hmm trying to understand more on this, we do have pytree to flatten the args input for certain ops, i.e. the foreach op list have pytree flatten enabled https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/ops/_pointwise_ops.py#L641

oh right, is the idea that DTensor wants to manually specify which ops actually need the pytree machinery, so their usage is more limited (to avoid the perf impact?)

In that case, we can probably tweak this code to only do the tree_map here for ops that have opted into "requiring pytrees" (probably by checking the flag that you just linked)

oh right, is the idea that DTensor wants to manually specify which ops actually need the pytree machinery, so their usage is more limited (to avoid the perf impact?)

Yep!

In that case, we can probably tweak this code to only do the tree_map here for ops that have opted into "requiring pytrees" (probably by checking the flag that you just linked)

We already sorta doing that in a few lines above https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/_dispatch.py#L291-L294

So I am wondering if there're some additional bugs that triggered that issue

oh you are totally right... let me take another look

ah yep you're right. So:

(1) this logic does the right thing of operating on the flattened args, as long as we detect that the op's sharding rule has opted into using pytrees

(2) the only bug is that DTensor was missing sharding rules for a few foreach ops, causing DTensor to use the "default" path of not using pytrees, in a "bad" way that causes DTensor to infinite loop horrible (I think, before getting a chance to error about not having a proper sharding rule).

hmm @wanchaol - is there any easy way to know earlier in advance if an op has no sharding prop rule and thus will fail, so we can check it here? https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/_dispatch.py#L291

One option is maybe to detect when there's no sharding prop rule and error earlier, before we try to generate FakeTensor arguments.

(for now, I just "fixed" the issue by updated the foreach_ops list that DTensor relies on)

ah yep you're right. So:

(1) this logic does the right thing of operating on the flattened args, as long as we detect that the op's sharding rule has opted into using pytrees

(2) the only bug is that DTensor was missing sharding rules for a few foreach ops, causing DTensor to use the "default" path of not using pytrees, in a "bad" way that causes DTensor to infinite loop horrible (I think, before getting a chance to error about not having a proper sharding rule).

hmm @wanchaol - is there any easy way to know earlier in advance if an op has no sharding prop rule and thus will fail, so we can check it here? https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/_dispatch.py#L291

One option is maybe to detect when there's no sharding prop rule and error earlier, before we try to generate FakeTensor arguments.

For (2), I think the issue is that _propagate_tensor_meta(op_schema)(https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/_sharding_prop.py#L198) get called first and errors out so we could see all different errors (here are some more examples where we don't see the infinite loop but some other error: #124990) even though the true reason is that no shardping prop has been registered for an op.

I think this partially fixes #132016. It doesn't fully fix it though because: (1) we don't have a sharding prop rule for `aten. _foreach_mul_` (2) when we don't have a sharding prop rule, we assume that the schema of our op just has a flat list of inputs that does not require any nested unflattening UPDATE: the linked repro passes now that I added the missing foreach overloads, so DTensor registers proper sharding rules for them. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

ghstack-source-id: 17f9738 Pull Request resolved: #132066

wanchaol

lgtm thanks!

wanchaol · 2024-08-01T21:04:26Z

Please change the title and summary to reflect newest changes :)

I think this partially fixes #132016. It doesn't fully fix it though because: (1) we don't have a sharding prop rule for `aten. _foreach_mul_` (2) when we don't have a sharding prop rule, we assume that the schema of our op just has a flat list of inputs that does not require any nested unflattening UPDATE: the linked repro passes now that I added the missing foreach overloads, so DTensor registers proper sharding rules for them. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

bdhirsh · 2024-08-01T22:07:37Z

@pytorchbot merge

pytorchmergebot · 2024-08-01T22:09:23Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

awgu · 2024-08-05T13:44:45Z

@pytorchbot merge

pytorchmergebot · 2024-08-05T13:46:35Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

DTensor: use pytrees to convert DTensors into specs

eaad72f

[ghstack-poisoned]

bdhirsh mentioned this pull request Jul 29, 2024

track number of cpp->python exceptions thrown in torch.compile benchmark suite #131481

Closed

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jul 29, 2024

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim, ezyang and miladm July 29, 2024 17:40

bdhirsh commented Jul 29, 2024

View reviewed changes

wz337 reviewed Jul 29, 2024

View reviewed changes

albanD removed their request for review July 29, 2024 17:57

wanchaol reviewed Jul 29, 2024

View reviewed changes

bdhirsh added a commit that referenced this pull request Jul 29, 2024

DTensor: use pytrees to convert DTensors into specs

84889a5

ghstack-source-id: 17f9738 Pull Request resolved: #132066

This was referenced Jul 29, 2024

torch._foreach_div(DTensor, float) does not work #132017

Closed

_foreach_mul(DTensor, Tensor scalar) does not work #132016

Closed

ezyang removed their request for review July 31, 2024 02:14

wanchaol approved these changes Aug 1, 2024

View reviewed changes

bdhirsh changed the title ~~DTensor: use pytrees to convert DTensors into specs~~ DTensor: add more foreach ops to supported sharding prop list Aug 1, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 1, 2024

pytorchmergebot added the merging label Aug 1, 2024

pytorchmergebot removed the merging label Aug 1, 2024

bdhirsh added the release notes: distributed (dtensor) release notes category label Aug 2, 2024

awgu added release notes: distributed (dtensor) release notes category and removed release notes: distributed (dtensor) release notes category labels Aug 5, 2024

pytorchmergebot added the merging label Aug 5, 2024

pytorchmergebot closed this in b465a58 Aug 5, 2024

pytorchmergebot added Merged and removed merging labels Aug 5, 2024

ptrblck mentioned this pull request Aug 27, 2024

test_parity__foreach_* tests segfault in kineto #134596

Closed

github-actions bot deleted the gh/bdhirsh/594/head branch September 5, 2024 02:01

dbusbridge mentioned this pull request May 2, 2025

torch._foreach_pow(DTensor, float) and torch._foreach_pow_(DTensor, float) do not work #152696

Closed

DTensor: add more foreach ops to supported sharding prop list #132066

DTensor: add more foreach ops to supported sharding prop list #132066

Conversation

bdhirsh commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132066

✅ You can merge normally! (8 Unrelated Failures)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wz337 Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wanchaol left a comment

Choose a reason for hiding this comment

Uh oh!

wanchaol commented Aug 1, 2024

Uh oh!

bdhirsh commented Aug 1, 2024

Uh oh!

pytorchmergebot commented Aug 1, 2024

Merge failed

Uh oh!

awgu commented Aug 5, 2024

Uh oh!

pytorchmergebot commented Aug 5, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bdhirsh commented Jul 29, 2024 •

edited

Loading

pytorch-bot bot commented Jul 29, 2024 •

edited

Loading

wz337 Jul 29, 2024 •

edited

Loading