Skip to content

[annotation] Override metadata on regenerated node in functional mode#166200

Closed
yushangdi wants to merge 1 commit intomainfrom
annotation_replay
Closed

[annotation] Override metadata on regenerated node in functional mode#166200
yushangdi wants to merge 1 commit intomainfrom
annotation_replay

Conversation

@yushangdi
Copy link
Contributor

@yushangdi yushangdi commented Oct 24, 2025

Fixes #165810

If we regenerate a node during functionalization, we override the "stack_trace", "custom", and "seq_nr" metadata of the regenerated node with the node meta of the original node.

python test/functorch/test_aot_joint_with_descriptors.py -k test_preserve_annotate_replay_view
python test/functorch/test_aotdispatch.py TestAOTAutogradWithDynamo.test_duplicated_arguments_on_tensor_overlap

cc @ezyang @EikanWang @jgong5 @wenzhe-nrv

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166200

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (10 Unrelated Failures)

As of commit ea81061 with merge base 8daef35 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: fx release notes category label Oct 24, 2025
torch._from_functional_tensor(args[0].elem)
].proxy.node
with fx_traceback.set_current_replay_node(curr_node):
torch._sync(a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably what's breaking the tests. Why is it inconvenient to annotate later during the normal replay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang see details in #165810. We need to know the original node's information before replay so the replayed nodes has the correct stack trace (stack trace of the original node being replayed) and seq_nr.

@yushangdi yushangdi force-pushed the annotation_replay branch 4 times, most recently from 26813e9 to a5292b6 Compare October 27, 2025 19:58
@yushangdi yushangdi requested a review from Chillee as a code owner October 27, 2025 19:58
@yushangdi yushangdi force-pushed the annotation_replay branch 2 times, most recently from a5aa81f to 57fa5b0 Compare October 27, 2025 23:02
@yushangdi yushangdi requested a review from albanD as a code owner October 27, 2025 23:02
if "seq_nr" in replay_node.meta:
annotation_log.debug("seq_nr from replay_node")
new_seq_nr = replay_node.meta["seq_nr"]
node.meta["is_functional_regenerated"] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually need this info? I guess it doesn't hurt to include

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used now. I just think it could be helpful for debugging later. e.g. if we see that the seq_nr/stack_trace is not consistent with some other metadata on the node, and if we see "is_functional_regenerated", we can confirm that this is due to regeneration.

if "custom" in replay_node.meta:
node.meta["custom"] = replay_node.meta.get("custom")
if "stack_trace" in replay_node.meta:
node.meta["stack_trace"] = replay_node.meta.get("stack_trace")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems a bit inconsistent to manually set the node.meta["stack_trace"] in this code path, which only runs when replay_node is set. Is there a shared codepath we can use for setting the stack_trace metadata?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, looking up at line 189 it seems like this will give us inconsistent values between node.meta['stack_trace'] and node.stack_trace

Copy link
Contributor Author

@yushangdi yushangdi Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node.stack_trace is the same as node.meta['stack_trace']. it's a property that does "return self.meta.get("stack_trace", None)". I changed to use node.stack_trace now to be more consistent with the other code, but it's the same as setting node.meta.

I put it at the bottom so 1) the metadata can override whatever we copied from _COPY_META_FIELDS above, and 2) the metadata overriding logic are all in the same code block.

@yushangdi yushangdi force-pushed the annotation_replay branch 2 times, most recently from df8b94a to a2c7b42 Compare October 28, 2025 03:42
@yushangdi yushangdi requested a review from bdhirsh October 28, 2025 03:52
Copy link
Contributor

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@yushangdi
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 28, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the annotation_replay branch November 29, 2025 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fx Merged release notes: fx release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stack trace and annotation on node is "wrong" after aot_export_joint_with_descriptors because of regenerate_from_base

5 participants