Revert #168264 + Python-side LRU cache when native op schema is not supported by ezyang · Pull Request #168269 · pytorch/pytorch

ezyang · 2025-11-20T17:03:08Z

Stack from ghstack (oldest at bottom):

-> Revert #168264 + Python-side LRU cache when native op schema is not supported #168269

This reverts #168264 but with a bugfix for the reason why it was reverted.

Signed-off-by: Edward Z. Yang ezyang@meta.com

[ghstack-poisoned]

Authored with claude code Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: b37844f Pull-Request: #168269

ezyang · 2025-11-20T17:03:36Z

torch/csrc/autograd/python_variable.cpp

        py_op.ptr(),
        args.ptr(),
        kwargs.ptr());
-    py::object sharding = checked_vectorcall(


review with ignore whitespace changes

wconstab · 2025-11-20T17:17:01Z

you're working on a test case?

pytorch-bot · 2025-11-20T17:20:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/168269

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 41 Pending

As of commit 09b6fe1 with merge base d3ccb8f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang · 2025-11-20T17:32:16Z

you're working on a test case?

yes. unfortunately claude didn't write a good test lol

albanD

Needs a test to make sure we don't regress again.
SGTM once added

albanD · 2025-11-20T17:34:27Z

torch/csrc/autograd/python_variable.cpp

+      checked_vectorcall(
+          sharding_propagator.attr("propagate").ptr(),
+          py_op_info.ptr());
+      cached_sharding = py_op_info.attr(dtensor_interned_strings.output_sharding);


Should this be added to the c++ cache here to make subsequent calls faster?

The problem is when opt_native_op_schema fails to be returned, the C++ actually just has no cache key to lookup against, so this really wouldn't help (we HAVE to go into the pytree path before we can compute the cache key).

zpcore · 2025-11-20T17:52:50Z

torch/csrc/autograd/python_variable.cpp

-    if (opt_native_op_schema.has_value()) {
+
+    // Use Python-side LRU cache when native cache is not available
+    if (!opt_native_op_schema.has_value()) {


Note the slow path will be triggered when input args contain List[Tensor] or contains symbolic shape.

[ghstack-poisoned]

Authored with claude code Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: dcebadd Pull-Request: #168269

ezyang · 2025-11-20T20:29:11Z

@pytorchbot merge

pytorchmergebot · 2025-11-20T20:32:19Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-11-20T21:20:51Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (distributed, 3, 3, linux.rocm.gpu.gfx942.4)

Details for Dev Infra team

Raised by workflow job

[ghstack-poisoned]

Authored with claude code Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 2748b69 Pull-Request: #168269

ezyang · 2025-11-21T04:38:24Z

The latest implementation is better, and has lots of comments, and I basically undid all the Claude code stuff.

wconstab · 2025-11-21T04:57:30Z

torch/csrc/autograd/python_variable.cpp

        kwargs.ptr(),
-        py_op_info.ptr());
+        py_op_info.ptr(),
+        /*try_cache*/ !opt_native_op_schema.has_value() ? Py_True : Py_False);


iiuc 2 major changes since the last rev

we now do the "if sharding is actually a tensor, return early" bit 100% of the time that we call into python, whereas previously we only did it on the 'slow path' and not on the 'fast path'

we always call the 'slow path' entrypoint back to python, and added the caching logic into it (gated by try_cache) rather than having 2 callouts to python in c++

LGTM!

if i were a better man, i'd have written tests for these cases lol

[ghstack-poisoned]

Authored with claude code Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 2748b69 Pull-Request: #168269

ezyang · 2025-11-21T14:26:29Z

@pytorchbot merge -f "rebuild not necessary, previous commit is identical"

pytorchmergebot · 2025-11-21T14:28:03Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…upported (#168269) This reverts #168264 but with a bugfix for the reason why it was reverted. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: #168269 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/zpcore, https://github.com/malfet

Authored with claude code Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: dcebadd Pull-Request: pytorch/pytorch#168269

Authored with claude code Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 5240429 Pull-Request: pytorch/pytorch#168269

Update

0685522

[ghstack-poisoned]

ezyang requested review from albanD and soulitzer as code owners November 20, 2025 17:03

ezyang added a commit that referenced this pull request Nov 20, 2025

Used Python-side LRU cache when native op schema is not supported

0bcf7d1

Authored with claude code Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: b37844f Pull-Request: #168269

github-actions bot requested review from SherlockNoMad, antoniojkim, bdhirsh and miladm November 20, 2025 17:03

ezyang commented Nov 20, 2025

View reviewed changes

ezyang requested review from wconstab and zpcore November 20, 2025 17:04

wconstab approved these changes Nov 20, 2025

View reviewed changes

wconstab mentioned this pull request Nov 20, 2025

[DTensor] Clean up ShardingProp LRUCache #168198

Closed

albanD approved these changes Nov 20, 2025

View reviewed changes

zpcore reviewed Nov 20, 2025

View reviewed changes

zpcore approved these changes Nov 20, 2025

View reviewed changes

Update

86e698c

[ghstack-poisoned]

ezyang mentioned this pull request Nov 20, 2025

Test for stack cache bug #168294

Closed

ezyang added the topic: bug fixes topic category label Nov 20, 2025

Update

500f828

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Nov 20, 2025

Used Python-side LRU cache when native op schema is not supported

62f520b

Authored with claude code Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: dcebadd Pull-Request: #168269

ezyang added the release notes: distributed (dtensor) release notes category label Nov 20, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 20, 2025

pytorchmergebot added the merging label Nov 20, 2025

ezyang mentioned this pull request Nov 20, 2025

[BE] Restore a unified cache clear for both C++ and Python caches #168301

Closed

pytorchmergebot removed the merging label Nov 20, 2025

malfet approved these changes Nov 20, 2025

View reviewed changes

Update

5ab0692

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Nov 21, 2025

Used Python-side LRU cache when native op schema is not supported

352250a

Authored with claude code Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 2748b69 Pull-Request: #168269

ezyang mentioned this pull request Nov 21, 2025

Reapply "C++ fastpath dispatch path for DTensor (#168264)" #168330

Closed

pytorch-bot bot added the ciflow/inductor label Nov 21, 2025

wconstab reviewed Nov 21, 2025

View reviewed changes

Update

09b6fe1

[ghstack-poisoned]

ezyang requested a review from mikaylagawarecki as a code owner November 21, 2025 14:24

ezyang added a commit that referenced this pull request Nov 21, 2025

Used Python-side LRU cache when native op schema is not supported

0b99410

Authored with claude code Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 2748b69 Pull-Request: #168269

ezyang changed the title ~~Used Python-side LRU cache when native op schema is not supported~~ Revert #168264 + Python-side LRU cache when native op schema is not supported Nov 21, 2025

pytorch-bot bot added the ci-no-td Do not run TD on this PR label Nov 21, 2025

pytorchmergebot added the merging label Nov 21, 2025

pytorchmergebot closed this in d4de871 Nov 21, 2025

pytorchmergebot added Merged and removed merging labels Nov 21, 2025

ezyang mentioned this pull request Nov 21, 2025

DTensor not cache hitting for torch.stack (and likely other ops involving lists) #168255

Closed

github-actions bot deleted the gh/ezyang/3204/head branch December 22, 2025 02:20

Conversation

ezyang commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

wconstab commented Nov 20, 2025

Uh oh!

pytorch-bot bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/168269

⏳ No Failures, 41 Pending

Uh oh!

ezyang commented Nov 20, 2025

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

zpcore Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang commented Nov 20, 2025

Uh oh!

pytorchmergebot commented Nov 20, 2025

Merge started

Uh oh!

pytorchmergebot commented Nov 20, 2025

Merge failed

Uh oh!

ezyang commented Nov 21, 2025

Uh oh!

wconstab Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang commented Nov 21, 2025

Uh oh!

pytorchmergebot commented Nov 21, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ezyang commented Nov 20, 2025 •

edited

Loading

pytorch-bot bot commented Nov 20, 2025 •

edited

Loading