[DTensor] ignore fresh unbacked symbols in shard prop by pianpwk · Pull Request #166989 · pytorch/pytorch

pianpwk · 2025-11-04T19:35:13Z

This fixes 2 issues with the DTensor data-dependent test case:

ShapeEnv not found when doing shard prop on data-dependent ops - fix was to detect the outer tracing fake mode. Maybe ShardingPropagator should just own a FakeMode & ShapeEnv for these purposes? The previous behavior was to initialize a new fake mode on every call.

Pending unbacked symbols not found. This happens because DTensor dispatch runs fake prop twice, once while figuring out the output sharding:

pytorch/torch/distributed/tensor/_sharding_prop.py

Line 175 in 2bba373

fake_out = op_schema.op(*fake_args, **fake_kwargs)

and again to actually get the resulting local tensor:

pytorch/torch/distributed/tensor/_dispatch.py

Lines 254 to 255 in 2bba373

    
           # normal case, run local sharded op computation 
        
           local_results = op_call(*local_tensor_args, **op_info.local_kwargs)

With data-dependent ops, both calls will produce an unbacked symbol, but symbols in the first invocation are never surfaced, producing this error, so we ignore pending symbols from this site.

Stack from ghstack (oldest at bottom):

-> [DTensor] ignore fresh unbacked symbols in shard prop #166989

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci

[ghstack-poisoned]

pytorch-bot · 2025-11-04T19:35:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166989

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5cdeade with merge base 82fa2aa ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

tugsbayasgalan · 2025-11-04T20:10:04Z

torch/distributed/tensor/_sharding_prop.py

        from torch.fx.experimental.proxy_tensor import disable_proxy_modes_tracing

-        with FakeTensorMode(), disable_proxy_modes_tracing():
+        fake_mode = detect_fake_mode() or FakeTensorMode()


should we also initialize dummy shape env here as well? When there is no fake mode from tracing context, below lines will fail with errors like "NoneType doesn't have create_unbacked_symint". It seems to me that even in eager, you would do this fake tensor prop thing right?

ezyang · 2025-11-06T04:41:42Z

test/distributed/tensor/test_dtensor_export.py

+        y = torch.randint(1, (10,)).bool()
+        x_dt = distribute_tensor(x, device_mesh, placements=[Replicate()])
+        y_dt = distribute_tensor(y, device_mesh, placements=[Replicate()])
+        _dynamo_graph_capture_for_export(Foo())(x_dt, y_dt)


@dzmitry-huba does this run with LocalTensor 🤔

ezyang · 2025-11-06T04:42:36Z

@laithsakka ptal

laithsakka · 2025-11-06T20:49:27Z

torch/distributed/tensor/_sharding_prop.py


-        with FakeTensorMode(), disable_proxy_modes_tracing():
+        fake_mode = detect_fake_mode() or FakeTensorMode()
+        suppress_fresh_symbols_ctx = (


can you add a comment explaining why ignore_fresh_unbacked_symbols() is safe here.

This fixes 2 issues with the DTensor data-dependent test case: 1) ShapeEnv not found when doing shard prop on data-dependent ops - fix was to detect the outer tracing fake mode. Maybe ShardingPropagator should just own a FakeMode & ShapeEnv for these purposes? The previous behavior was to initialize a new fake mode on every call. 2) Pending unbacked symbols not found. This happens because DTensor dispatch runs fake prop twice, once while figuring out the output sharding: https://github.com/pytorch/pytorch/blob/2bba37309bc8996fc6a190592e5ad9aac53761c9/torch/distributed/tensor/_sharding_prop.py#L175 and again to actually get the resulting local tensor: https://github.com/pytorch/pytorch/blob/2bba37309bc8996fc6a190592e5ad9aac53761c9/torch/distributed/tensor/_dispatch.py#L254-L255 With data-dependent ops, both calls will produce an unbacked symbol, but symbols in the first invocation are never surfaced, producing this error, so we ignore pending symbols from this site. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 1fdb718 Pull Request resolved: #166989

pianpwk · 2025-11-07T18:13:03Z

@pytorchbot merge

pytorchmergebot · 2025-11-07T18:15:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ghstack-source-id: 0f80220 Pull Request resolved: pytorch/pytorch#166989

This fixes 2 issues with the DTensor data-dependent test case: 1) ShapeEnv not found when doing shard prop on data-dependent ops - fix was to detect the outer tracing fake mode. Maybe ShardingPropagator should just own a FakeMode & ShapeEnv for these purposes? The previous behavior was to initialize a new fake mode on every call. 2) Pending unbacked symbols not found. This happens because DTensor dispatch runs fake prop twice, once while figuring out the output sharding: https://github.com/pytorch/pytorch/blob/2bba37309bc8996fc6a190592e5ad9aac53761c9/torch/distributed/tensor/_sharding_prop.py#L175 and again to actually get the resulting local tensor: https://github.com/pytorch/pytorch/blob/2bba37309bc8996fc6a190592e5ad9aac53761c9/torch/distributed/tensor/_dispatch.py#L254-L255 With data-dependent ops, both calls will produce an unbacked symbol, but symbols in the first invocation are never surfaced, producing this error, so we ignore pending symbols from this site. Pull Request resolved: pytorch#166989 Approved by: https://github.com/ezyang

suppress fresh unbacked in shard prop

3244828

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Nov 4, 2025

pianpwk mentioned this pull request Nov 4, 2025

[DTensor] statically_known_true for slice strategy #166990

Closed

tugsbayasgalan reviewed Nov 4, 2025

View reviewed changes

pianpwk added the topic: not user facing topic category label Nov 4, 2025

pianpwk changed the title ~~suppress fresh unbacked in shard prop~~ [DTensor] ignore fresh unbacked symbol in shard prop Nov 4, 2025

pianpwk added release notes: distributed (dtensor) release notes category and removed topic: not user facing topic category labels Nov 4, 2025

pianpwk requested review from bdhirsh, ezyang and wconstab November 4, 2025 20:52

pianpwk changed the title ~~[DTensor] ignore fresh unbacked symbol in shard prop~~ [DTensor] ignore fresh unbacked symbols in shard prop Nov 4, 2025

ezyang reviewed Nov 6, 2025

View reviewed changes

ezyang approved these changes Nov 6, 2025

View reviewed changes

ezyang requested a review from laithsakka November 6, 2025 04:42

laithsakka reviewed Nov 6, 2025

View reviewed changes

pianpwk added a commit that referenced this pull request Nov 7, 2025

suppress fresh unbacked in shard prop

f3e0a62

ghstack-source-id: 1fdb718 Pull Request resolved: #166989

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 7, 2025

pytorchmergebot added the merging label Nov 7, 2025

pytorchmergebot added the Merged label Nov 7, 2025

pytorchmergebot closed this in 8eb2130 Nov 7, 2025

pytorchmergebot removed the merging label Nov 7, 2025

Khanaksahu pushed a commit to Khanaksahu/pytorch that referenced this pull request Nov 17, 2025

suppress fresh unbacked in shard prop

d4dc561

ghstack-source-id: 0f80220 Pull Request resolved: pytorch/pytorch#166989

github-actions bot deleted the gh/pianpwk/29/head branch December 8, 2025 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DTensor] ignore fresh unbacked symbols in shard prop#166989

[DTensor] ignore fresh unbacked symbols in shard prop#166989
pianpwk wants to merge 2 commits intogh/pianpwk/29/basefrom
gh/pianpwk/29/head

pianpwk commented Nov 4, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

tugsbayasgalan Nov 4, 2025 •

edited

Loading

Uh oh!

ezyang Nov 6, 2025

Uh oh!

ezyang commented Nov 6, 2025

Uh oh!

laithsakka Nov 6, 2025

Uh oh!

pianpwk commented Nov 7, 2025

Uh oh!

pytorchmergebot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	# normal case, run local sharded op computation
	local_results = op_call(local_tensor_args, *op_info.local_kwargs)

Conversation

pianpwk commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166989

✅ No Failures

Uh oh!

tugsbayasgalan Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang commented Nov 6, 2025

Uh oh!

laithsakka Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

pianpwk commented Nov 7, 2025

Uh oh!

pytorchmergebot commented Nov 7, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pianpwk commented Nov 4, 2025 •

edited

Loading

pytorch-bot bot commented Nov 4, 2025 •

edited

Loading

tugsbayasgalan Nov 4, 2025 •

edited

Loading