Introduce HOP for inductor compiled regions to allow torch dispatch (inductor_compiled_code) by jamesjwu · Pull Request #167844 · pytorch/pytorch

jamesjwu · 2025-11-14T16:27:44Z

Stack from ghstack (oldest at bottom):

-> Introduce HOP for inductor compiled regions to allow torch dispatch (inductor_compiled_code) #167844

This is a cleaned up version of the POC at https://github.com/pytorch/pytorch/pull/167752/files

This PR adds a inductor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches.

This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of inductor_config, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe.

Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves.

The context here is we want to be able to support compiled regions such as flex attention in eager mode, while working with other torch dispatch tracers like SAC. Will add more tests for SAC/flex attention specific things next.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela

[ghstack-poisoned]

ghstack-source-id: c76d43a Pull Request resolved: #167844

pytorch-bot · 2025-11-14T16:27:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167844

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 1efd6c8 with merge base 0b3bdb0 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 2, 2, linux.2xlarge.amx) (gh) (trunk failure)
stable_diffusion_unet
inductor / inductor-cpu-test / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.2xlarge.amx) (gh) (trunk failure)
stable_diffusion_unet
inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
stable_diffusion_unet

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable) (gh) (#166072)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

ghstack-source-id: f15a7c9 Pull Request resolved: #167844

This PR adds a indcutor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches. This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of `inductor_config`, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe. Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

ghstack-source-id: 1df4a29 Pull Request resolved: #167844

This is a cleaned up version of the POC at https://github.com/pytorch/pytorch/pull/167752/files This PR adds a indcutor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches. This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of `inductor_config`, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe. Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves. The context here is we want to be able to support compiled regions such as flex attention in eager mode, while working with other torch dispatch tracers like SAC. Will add more tests for SAC/flex attention specific things next. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

ghstack-source-id: 2684182 Pull Request resolved: #167844

This is a cleaned up version of the POC at https://github.com/pytorch/pytorch/pull/167752/files This PR adds a indcutor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches. This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of `inductor_config`, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe. Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves. The context here is we want to be able to support compiled regions such as flex attention in eager mode, while working with other torch dispatch tracers like SAC. Will add more tests for SAC/flex attention specific things next. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

ghstack-source-id: 59d5651 Pull Request resolved: #167844

ezyang · 2025-11-17T15:08:24Z

torch/_inductor/config.py


+# Wrap compiled regions in inductor_compiled_code HOP to make them visible to
+# TorchDispatchModes like DebugMode and Selective Activation Checkpointing.
+# This avoids runtime overhead of checking dispatch modes at every call.


Err, this comment is weird. When this config is on you are checking dispatch mode every call no

ezyang · 2025-11-17T15:09:00Z

torch/_inductor/output_code.py

+            original_callable = self.current_callable
+
+            def wrapped_callable(inputs):
+                return inductor_compiled_code(original_callable, inputs)


I think my old strategy was good (explicitly testing if there's a mode on) and you should do it. The HOP dispatch is quite slow and I want to be moving us towards having this code on by default.

ezyang · 2025-11-17T15:12:29Z

torch/_inductor/output_code.py

        self._boxed_call = True

+        # Store whether to wrap compiled regions in inductor_compiled_code HOP
+        # This is set at compile time to avoid runtime overhead


Uhh, sure, but this saving is dwarfed by the fact that you're always calling into the HOP now

ezyang · 2025-11-17T15:13:00Z

torch/_higher_order_ops/wrap.py

+
+inductor_compiled_code = InductorCompiledCode()
+inductor_compiled_code.fallthrough(DispatchKey.AutogradCPU)
+inductor_compiled_code.fallthrough(DispatchKey.AutogradCUDA)


@patrick-toulme do we need to add MTIA here too

For the love of god someone please make DispatchKey.Autograd here work LOL

ezyang · 2025-11-17T15:15:40Z

test/dynamo/test_wrap_inductor_compiled_regions.py

+
+        # Use config.patch to enable wrapping at inductor level
+        with inductor_config.patch({"wrap_inductor_compiled_regions": True}):
+            compiled_fn = torch.compile(


This test feels insufficient. I specifically am looking for a test where we SAC around a compiled region, but in every single one of these tests it seems you are still compiling around the SAC.

ezyang

Stamping to unblock, but the tests seem a bit sloppy

jamesjwu · 2025-11-17T18:29:40Z

@pytorchbot merge

Going to address these comments in the next PR

pytorchmergebot · 2025-11-17T18:31:41Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

drisspg · 2025-11-17T19:34:28Z

The one thing, this hop is a singleton and right now we can only ever annotate a region w/ 1 SAC policy right and not have per graph SAC policies?

So this is the likely policy right

def policy_fn(fn, *args, **kwargs):
	if fn == `inductor_wraps_hop`:
		return MUST_SAVE

But do we forsee any places where a user would want to do different policies ? Maybe we should pass in an fx_annotation into the op that users could match against

pytorchmergebot · 2025-11-17T19:37:39Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable)

Details for Dev Infra team

Raised by workflow job

… dispatch" This is a cleaned up version of the POC at https://github.com/pytorch/pytorch/pull/167752/files This PR adds a inductor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches. This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of `inductor_config`, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe. Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves. The context here is we want to be able to support compiled regions such as flex attention in eager mode, while working with other torch dispatch tracers like SAC. Will add more tests for SAC/flex attention specific things next. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

ghstack-source-id: a6fcf01 Pull Request resolved: #167844

jamesjwu · 2025-11-17T23:09:30Z

Fix tests, address code review

jamesjwu · 2025-11-17T23:10:52Z

Hmm that's a good point about multiple policies — it seems like this should be addable, let me think on it

… dispatch" This is a cleaned up version of the POC at https://github.com/pytorch/pytorch/pull/167752/files This PR adds a inductor option which you can pass into torch.compile that wraps all inductor generated code in a HOP, allowing it to be read by torch dispatches. This hop is created in output_code.post_compile, so it's cache safe. The configuration to turn it on is part of `inductor_config`, and therefore already part of the cache key. I've added a test that shows this HOP is cache safe. Because this wrapper occurs at compile time, there should be little to no cpu overhead from creating it, besides that of actually processing the torch_dispatches themselves. The context here is we want to be able to support compiled regions such as flex attention in eager mode, while working with other torch dispatch tracers like SAC. Will add more tests for SAC/flex attention specific things next. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

ghstack-source-id: a6a6025 Pull Request resolved: #167844

ezyang · 2025-11-18T03:56:26Z

@pytorchbot merge -i

pytorchmergebot · 2025-11-18T03:58:14Z

Merge started

Your change will be merged while ignoring the following 4 checks: trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable), inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 2, 2, linux.2xlarge.amx), inductor / inductor-cpu-test / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.2xlarge.amx), inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ezyang · 2025-11-18T04:55:27Z

@pytorchbot merge -f "all unnecessary errors"

pytorchmergebot · 2025-11-18T04:55:44Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2025-11-18T04:57:13Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Inductor compiled region hop [WIP]

255c064

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Nov 14, 2025

Inductor compiled region hop [WIP]

f440565

ghstack-source-id: c76d43a Pull Request resolved: #167844

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels Nov 14, 2025

Update on "Inductor compiled region hop [WIP]"

6ecb609

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Nov 14, 2025

Inductor compiled region hop [WIP]

749f09f

ghstack-source-id: f15a7c9 Pull Request resolved: #167844

jamesjwu added the topic: not user facing topic category label Nov 14, 2025

jamesjwu added a commit that referenced this pull request Nov 14, 2025

Inductor compiled region hop [WIP]

14094f6

ghstack-source-id: 1df4a29 Pull Request resolved: #167844

jamesjwu requested review from eellison, ezyang and oulgen November 14, 2025 17:17

jamesjwu marked this pull request as ready for review November 14, 2025 17:19

jamesjwu requested a review from zou3519 as a code owner November 14, 2025 17:19

jamesjwu added a commit that referenced this pull request Nov 14, 2025

Inductor compiled region hop [WIP]

d89e71c

ghstack-source-id: 2684182 Pull Request resolved: #167844

jamesjwu requested a review from drisspg November 14, 2025 18:38

jamesjwu added a commit that referenced this pull request Nov 14, 2025

Inductor compiled region hop [WIP]

2dee5d1

ghstack-source-id: 59d5651 Pull Request resolved: #167844

jamesjwu changed the title ~~Inductor compiled region hop [WIP]~~ Introduce HOP for inductor compiled regions to allow torch dispatch Nov 14, 2025

ezyang reviewed Nov 17, 2025

View reviewed changes

ezyang approved these changes Nov 17, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 17, 2025

pytorchmergebot added the merging label Nov 17, 2025

pytorchmergebot removed the merging label Nov 17, 2025

jamesjwu mentioned this pull request Nov 17, 2025

Add more tests and only run HOP when torch dispatch mode is running #168029

Closed

jamesjwu added a commit that referenced this pull request Nov 17, 2025

Inductor compiled region hop [WIP]

19bd897

ghstack-source-id: a6fcf01 Pull Request resolved: #167844

jamesjwu requested review from tugsbayasgalan and ydwu4 as code owners November 18, 2025 02:07

jamesjwu added a commit that referenced this pull request Nov 18, 2025

Inductor compiled region hop [WIP]

c42530e

ghstack-source-id: a6a6025 Pull Request resolved: #167844

pytorchmergebot added the merging label Nov 18, 2025

pytorchmergebot closed this in e5e94ec Nov 18, 2025

pytorchmergebot added Merged and removed merging labels Nov 18, 2025

tianyu-l mentioned this pull request Nov 18, 2025

Warn that SAC + Compile for MoE models is not yet supported pytorch/torchtitan#2052

Closed

ruisizhang123 mentioned this pull request Nov 26, 2025

SimpleFSDP + DSV3 + FlexAttention issue pytorch/torchtitan#2089

Open

github-actions bot deleted the gh/jamesjwu/207/head branch December 19, 2025 02:19

ezyang changed the title ~~Introduce HOP for inductor compiled regions to allow torch dispatch~~ Introduce HOP for inductor compiled regions to allow torch dispatch (inductor_compiled_code) Jan 16, 2026

Conversation

jamesjwu commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167844

✅ You can merge normally! (4 Unrelated Failures)

Uh oh!

ezyang Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

jamesjwu Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

jamesjwu commented Nov 17, 2025

Uh oh!

pytorchmergebot commented Nov 17, 2025

Merge started

Uh oh!

drisspg commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorchmergebot commented Nov 17, 2025

Merge failed

Uh oh!

jamesjwu commented Nov 17, 2025

Uh oh!

jamesjwu commented Nov 17, 2025

Uh oh!

ezyang commented Nov 18, 2025

Uh oh!

pytorchmergebot commented Nov 18, 2025

Merge started

Uh oh!

ezyang commented Nov 18, 2025

Uh oh!

pytorchmergebot commented Nov 18, 2025

Uh oh!

pytorchmergebot commented Nov 18, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jamesjwu commented Nov 14, 2025 •

edited

Loading

pytorch-bot bot commented Nov 14, 2025 •

edited

Loading

drisspg commented Nov 17, 2025 •

edited

Loading