Skip to content

Introduce HOP for inductor compiled regions to allow torch dispatch (inductor_compiled_code)#167844

Closed
jamesjwu wants to merge 7 commits intogh/jamesjwu/207/basefrom
gh/jamesjwu/207/head
Closed

Introduce HOP for inductor compiled regions to allow torch dispatch (inductor_compiled_code)#167844
jamesjwu wants to merge 7 commits intogh/jamesjwu/207/basefrom
gh/jamesjwu/207/head

Conversation

@jamesjwu
Copy link
Contributor

@jamesjwu jamesjwu commented Nov 14, 2025

Stack from ghstack (oldest at bottom):

This is a cleaned up version of the POC at https://github.com/pytorch/pytorch/pull/167752/files

This PR adds a inductor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches.

This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of inductor_config, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe.

Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves.

The context here is we want to be able to support compiled regions such as flex attention in eager mode, while working with other torch dispatch tracers like SAC. Will add more tests for SAC/flex attention specific things next.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela

jamesjwu added a commit that referenced this pull request Nov 14, 2025
ghstack-source-id: c76d43a
Pull Request resolved: #167844
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167844

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 1efd6c8 with merge base 0b3bdb0 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Nov 14, 2025
ghstack-source-id: f15a7c9
Pull Request resolved: #167844
@jamesjwu jamesjwu added the topic: not user facing topic category label Nov 14, 2025
This PR adds a indcutor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches. 

This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of `inductor_config`, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe. 

Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves. 


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Nov 14, 2025
ghstack-source-id: 1df4a29
Pull Request resolved: #167844
@jamesjwu jamesjwu marked this pull request as ready for review November 14, 2025 17:19
@jamesjwu jamesjwu requested a review from zou3519 as a code owner November 14, 2025 17:19
This is a cleaned up version of the POC at https://github.com/pytorch/pytorch/pull/167752/files


This PR adds a indcutor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches. 

This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of `inductor_config`, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe. 

Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves. 

The context here is we want to be able to support compiled regions such as flex attention in eager mode, while working with other torch dispatch tracers like SAC. Will add more tests for SAC/flex attention specific things next. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Nov 14, 2025
ghstack-source-id: 2684182
Pull Request resolved: #167844
@jamesjwu jamesjwu requested a review from drisspg November 14, 2025 18:38
This is a cleaned up version of the POC at https://github.com/pytorch/pytorch/pull/167752/files


This PR adds a indcutor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches. 

This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of `inductor_config`, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe. 

Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves. 

The context here is we want to be able to support compiled regions such as flex attention in eager mode, while working with other torch dispatch tracers like SAC. Will add more tests for SAC/flex attention specific things next. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Nov 14, 2025
ghstack-source-id: 59d5651
Pull Request resolved: #167844
@jamesjwu jamesjwu changed the title Inductor compiled region hop [WIP] Introduce HOP for inductor compiled regions to allow torch dispatch Nov 14, 2025

# Wrap compiled regions in inductor_compiled_code HOP to make them visible to
# TorchDispatchModes like DebugMode and Selective Activation Checkpointing.
# This avoids runtime overhead of checking dispatch modes at every call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, this comment is weird. When this config is on you are checking dispatch mode every call no

original_callable = self.current_callable

def wrapped_callable(inputs):
return inductor_compiled_code(original_callable, inputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my old strategy was good (explicitly testing if there's a mode on) and you should do it. The HOP dispatch is quite slow and I want to be moving us towards having this code on by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add

self._boxed_call = True

# Store whether to wrap compiled regions in inductor_compiled_code HOP
# This is set at compile time to avoid runtime overhead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhh, sure, but this saving is dwarfed by the fact that you're always calling into the HOP now


inductor_compiled_code = InductorCompiledCode()
inductor_compiled_code.fallthrough(DispatchKey.AutogradCPU)
inductor_compiled_code.fallthrough(DispatchKey.AutogradCUDA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrick-toulme do we need to add MTIA here too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the love of god someone please make DispatchKey.Autograd here work LOL


# Use config.patch to enable wrapping at inductor level
with inductor_config.patch({"wrap_inductor_compiled_regions": True}):
compiled_fn = torch.compile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test feels insufficient. I specifically am looking for a test where we SAC around a compiled region, but in every single one of these tests it seems you are still compiling around the SAC.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping to unblock, but the tests seem a bit sloppy

@jamesjwu
Copy link
Contributor Author

@pytorchbot merge

Going to address these comments in the next PR

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 17, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@drisspg
Copy link
Contributor

drisspg commented Nov 17, 2025

The one thing, this hop is a singleton and right now we can only ever annotate a region w/ 1 SAC policy right and not have per graph SAC policies?

So this is the likely policy right

def policy_fn(fn, *args, **kwargs):
	if fn == `inductor_wraps_hop`:
		return MUST_SAVE

But do we forsee any places where a user would want to do different policies ? Maybe we should pass in an fx_annotation into the op that users could match against

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable)

Details for Dev Infra team Raised by workflow job

… dispatch"


This is a cleaned up version of the POC at https://github.com/pytorch/pytorch/pull/167752/files


This PR adds a inductor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches. 

This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of `inductor_config`, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe. 

Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves. 

The context here is we want to be able to support compiled regions such as flex attention in eager mode, while working with other torch dispatch tracers like SAC. Will add more tests for SAC/flex attention specific things next. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Nov 17, 2025
ghstack-source-id: a6fcf01
Pull Request resolved: #167844
@jamesjwu
Copy link
Contributor Author

  • Fix tests, address code review

@jamesjwu
Copy link
Contributor Author

Hmm that's a good point about multiple policies — it seems like this should be addable, let me think on it

… dispatch"


This is a cleaned up version of the POC at https://github.com/pytorch/pytorch/pull/167752/files


This PR adds a inductor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches. 

This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of `inductor_config`, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe. 

Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves. 

The context here is we want to be able to support compiled regions such as flex attention in eager mode, while working with other torch dispatch tracers like SAC. Will add more tests for SAC/flex attention specific things next. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Nov 18, 2025
ghstack-source-id: a6a6025
Pull Request resolved: #167844
@ezyang
Copy link
Contributor

ezyang commented Nov 18, 2025

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

@ezyang
Copy link
Contributor

ezyang commented Nov 18, 2025

@pytorchbot merge -f "all unnecessary errors"

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/jamesjwu/207/head branch December 19, 2025 02:19
@ezyang ezyang changed the title Introduce HOP for inductor compiled regions to allow torch dispatch Introduce HOP for inductor compiled regions to allow torch dispatch (inductor_compiled_code) Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants