Skip to content

[DTensor] ignore fresh unbacked symbols in shard prop#166989

Closed
pianpwk wants to merge 2 commits intogh/pianpwk/29/basefrom
gh/pianpwk/29/head
Closed

[DTensor] ignore fresh unbacked symbols in shard prop#166989
pianpwk wants to merge 2 commits intogh/pianpwk/29/basefrom
gh/pianpwk/29/head

Conversation

@pianpwk
Copy link
Contributor

@pianpwk pianpwk commented Nov 4, 2025

This fixes 2 issues with the DTensor data-dependent test case:

  1. ShapeEnv not found when doing shard prop on data-dependent ops - fix was to detect the outer tracing fake mode. Maybe ShardingPropagator should just own a FakeMode & ShapeEnv for these purposes? The previous behavior was to initialize a new fake mode on every call.

  2. Pending unbacked symbols not found. This happens because DTensor dispatch runs fake prop twice, once while figuring out the output sharding:

    fake_out = op_schema.op(*fake_args, **fake_kwargs)
    and again to actually get the resulting local tensor:
    # normal case, run local sharded op computation
    local_results = op_call(*local_tensor_args, **op_info.local_kwargs)
    With data-dependent ops, both calls will produce an unbacked symbol, but symbols in the first invocation are never surfaced, producing this error, so we ignore pending symbols from this site.

Stack from ghstack (oldest at bottom):

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166989

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5cdeade with merge base 82fa2aa (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Nov 4, 2025
from torch.fx.experimental.proxy_tensor import disable_proxy_modes_tracing

with FakeTensorMode(), disable_proxy_modes_tracing():
fake_mode = detect_fake_mode() or FakeTensorMode()
Copy link
Contributor

@tugsbayasgalan tugsbayasgalan Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also initialize dummy shape env here as well? When there is no fake mode from tracing context, below lines will fail with errors like "NoneType doesn't have create_unbacked_symint". It seems to me that even in eager, you would do this fake tensor prop thing right?

@pianpwk pianpwk added the topic: not user facing topic category label Nov 4, 2025
@pianpwk pianpwk changed the title suppress fresh unbacked in shard prop [DTensor] ignore fresh unbacked symbol in shard prop Nov 4, 2025
@pianpwk pianpwk added release notes: distributed (dtensor) release notes category and removed topic: not user facing topic category labels Nov 4, 2025
@pianpwk pianpwk changed the title [DTensor] ignore fresh unbacked symbol in shard prop [DTensor] ignore fresh unbacked symbols in shard prop Nov 4, 2025
y = torch.randint(1, (10,)).bool()
x_dt = distribute_tensor(x, device_mesh, placements=[Replicate()])
y_dt = distribute_tensor(y, device_mesh, placements=[Replicate()])
_dynamo_graph_capture_for_export(Foo())(x_dt, y_dt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzmitry-huba does this run with LocalTensor 🤔

@ezyang ezyang requested a review from laithsakka November 6, 2025 04:42
@ezyang
Copy link
Contributor

ezyang commented Nov 6, 2025

@laithsakka ptal


with FakeTensorMode(), disable_proxy_modes_tracing():
fake_mode = detect_fake_mode() or FakeTensorMode()
suppress_fresh_symbols_ctx = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. can you add a comment explaining why ignore_fresh_unbacked_symbols() is safe here.

This fixes 2 issues with the DTensor data-dependent test case:

1) ShapeEnv not found when doing shard prop on data-dependent ops - fix was to detect the outer tracing fake mode. Maybe ShardingPropagator should just own a FakeMode & ShapeEnv for these purposes? The previous behavior was to initialize a new fake mode on every call.

2) Pending unbacked symbols not found. This happens because DTensor dispatch runs fake prop twice, once while figuring out the output sharding: https://github.com/pytorch/pytorch/blob/2bba37309bc8996fc6a190592e5ad9aac53761c9/torch/distributed/tensor/_sharding_prop.py#L175 and again to actually get the resulting local tensor: https://github.com/pytorch/pytorch/blob/2bba37309bc8996fc6a190592e5ad9aac53761c9/torch/distributed/tensor/_dispatch.py#L254-L255 With data-dependent ops, both calls will produce an unbacked symbol, but symbols in the first invocation are never surfaced, producing this error, so we ignore pending symbols from this site.




cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
pianpwk added a commit that referenced this pull request Nov 7, 2025
ghstack-source-id: 1fdb718
Pull Request resolved: #166989
@pianpwk
Copy link
Contributor Author

pianpwk commented Nov 7, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 7, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Khanaksahu pushed a commit to Khanaksahu/pytorch that referenced this pull request Nov 17, 2025
ghstack-source-id: 0f80220
Pull Request resolved: pytorch/pytorch#166989
Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request Nov 18, 2025
This fixes 2 issues with the DTensor data-dependent test case:

1) ShapeEnv not found when doing shard prop on data-dependent ops - fix was to detect the outer tracing fake mode. Maybe ShardingPropagator should just own a FakeMode & ShapeEnv for these purposes? The previous behavior was to initialize a new fake mode on every call.

2) Pending unbacked symbols not found. This happens because DTensor dispatch runs fake prop twice, once while figuring out the output sharding: https://github.com/pytorch/pytorch/blob/2bba37309bc8996fc6a190592e5ad9aac53761c9/torch/distributed/tensor/_sharding_prop.py#L175 and again to actually get the resulting local tensor: https://github.com/pytorch/pytorch/blob/2bba37309bc8996fc6a190592e5ad9aac53761c9/torch/distributed/tensor/_dispatch.py#L254-L255 With data-dependent ops, both calls will produce an unbacked symbol, but symbols in the first invocation are never surfaced, producing this error, so we ignore pending symbols from this site.

Pull Request resolved: pytorch#166989
Approved by: https://github.com/ezyang
@github-actions github-actions bot deleted the gh/pianpwk/29/head branch December 8, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants