Skip to content

Improve fake tensor leakage detection in export by not relying on gc too much#163516

Closed
tugsbayasgalan wants to merge 2 commits intogh/tugsbayasgalan/38/basefrom
gh/tugsbayasgalan/38/head
Closed

Improve fake tensor leakage detection in export by not relying on gc too much#163516
tugsbayasgalan wants to merge 2 commits intogh/tugsbayasgalan/38/basefrom
gh/tugsbayasgalan/38/head

Conversation

@tugsbayasgalan
Copy link
Contributor

@tugsbayasgalan tugsbayasgalan commented Sep 22, 2025

Stack from ghstack (oldest at bottom):

Previously we relied on gc to get the snapshot of fake tensors before and after export to get list of fake tensors that are created during export. This caused some flakiness in our test suite (#162232). it seems super hard to make gc deterministic, so we just instrument fake tensor creation which seems lot better. In addition, it is also quite faster than previous approach becuase we are no longer manually triggering garbage collector.

cc @ezyang @EikanWang @jgong5 @wenzhe-nrv

Differential Revision: D82966648

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163516

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit e727818 with merge base 0b59492 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@tugsbayasgalan
Copy link
Contributor Author

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 22, 2025
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure why not

@ezyang
Copy link
Contributor

ezyang commented Sep 22, 2025

cc @eellison for fake tensor constructor modifications

Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we track constructors specifically ? We already track every fake tensor in the fake tensor converter.

@tugsbayasgalan
Copy link
Contributor Author

Ah i didn't think about fake tensor converter lol, let me check

…ying on gc too much"


Previously we relied on gc to get the snapshot of fake tensors before and after export to get list of fake tensors that are created during export. This caused some flakiness in our test suite (#162232). it seems super hard to make gc deterministic, so we just instrument fake tensor creation which seems lot better. In addition, it is also quite faster than previous approach becuase we are no longer manually triggering garbage collector. 


cc ezyang EikanWang jgong5 wenzhe-nrv

Differential Revision: [D82966648](https://our.internmc.facebook.com/intern/diff/D82966648)

[ghstack-poisoned]
tugsbayasgalan added a commit that referenced this pull request Sep 22, 2025
@tugsbayasgalan
Copy link
Contributor Author

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@tugsbayasgalan
Copy link
Contributor Author

As @eellison discovered, we construct fake tensors in fake_impls as well, so fake tensor converter might not be the most reliable place for this instrumentation.

@tugsbayasgalan
Copy link
Contributor Author

@pytorchbot merge -f "Landed internally"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Can't merge closed PR #163516

dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…too much (pytorch#163516)

Previously we relied on gc to get the snapshot of fake tensors before and after export to get list of fake tensors that are created during export. This caused some flakiness in our test suite (pytorch#162232). it seems super hard to make gc deterministic, so we just instrument fake tensor creation which seems lot better. In addition, it is also quite faster than previous approach becuase we are no longer manually triggering garbage collector.

Differential Revision: [D82966648](https://our.internmc.facebook.com/intern/diff/D82966648)
Pull Request resolved: pytorch#163516
Approved by: https://github.com/ezyang
@github-actions github-actions bot deleted the gh/tugsbayasgalan/38/head branch October 23, 2025 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants