Skip to content

[inductor][ez] add overridable env var for disabling fx graph cache#166138

Closed
shunting314 wants to merge 1 commit intogh/shunting314/245/basefrom
gh/shunting314/245/head
Closed

[inductor][ez] add overridable env var for disabling fx graph cache#166138
shunting314 wants to merge 1 commit intogh/shunting314/245/basefrom
gh/shunting314/245/head

Conversation

@shunting314
Copy link
Contributor

@shunting314 shunting314 commented Oct 23, 2025

Stack from ghstack (oldest at bottom):

I set TORCHINDUCTOR_FX_GRAPH_CACHE=0 a lot to make sure the compilation
happens by disabling fx graph caching. I even put this in my .bashrc.
But this cause a simple vllm script fail:
https://gist.github.com/shunting314/4253b2b5ab5e7d1b0fc9516c84054904

Error log:
https://gist.github.com/shunting314/1d04bbeb58bc486f975684f56d65615d

The root cause is,

  1. vllm patch inductor_config.fx_graph_cache to True here:
    https://github.com/vllm-project/vllm/blob/e255d929902dcf8968541d2cbf0d18f0fe3f9c49/vllm/compilation/compiler_interface.py#L308

    The code in vllm relies fx graph cache is on (unless
    VLLM_DISABLE_COMPILE_CACHE is overriden to false)

  2. setting TORCHINDUCTOR_FX_GRAPH_CACHE=0 will cause
    inductor_config.fx_graph_cache not overridable.

I add TORCHINDUCTOR_FX_GRAPH_CACHE_DEFAULT so that we can still use it to skip fx
graph cache while still allow project like vllm to override it.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

I set TORCHINDUCTOR_FX_GRAPH_CACHE=0 a lot to make sure the compilation
happens by disabling fx graph caching. I even put this in my .bashrc.
But this cause a simple vllm script fail:
https://gist.github.com/shunting314/4253b2b5ab5e7d1b0fc9516c84054904

Error log:
https://gist.github.com/shunting314/1d04bbeb58bc486f975684f56d65615d

The root cause is,
1. vllm patch inductor_config.fx_graph_cache to True here:
   https://github.com/vllm-project/vllm/blob/e255d929902dcf8968541d2cbf0d18f0fe3f9c49/vllm/compilation/compiler_interface.py#L308

   The code in vllm relies fx graph cache is on (unless
   VLLM_DISABLE_COMPILE_CACHE is overriden to false)
2. setting TORCHINDUCTOR_FX_GRAPH_CACHE=0 will cause
   inductor_config.fx_graph_cache not overridable.

I add TORCHINDUCTOR_FX_GRAPH_CACHE_DEFAULT so that we can still use it to skip fx
graph cache while still allow project like vllm to override it.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166138

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit edd15b3 with merge base e20c9bf (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shunting314 added a commit that referenced this pull request Oct 23, 2025
I set TORCHINDUCTOR_FX_GRAPH_CACHE=0 a lot to make sure the compilation
happens by disabling fx graph caching. I even put this in my .bashrc.
But this cause a simple vllm script fail:
https://gist.github.com/shunting314/4253b2b5ab5e7d1b0fc9516c84054904

Error log:
https://gist.github.com/shunting314/1d04bbeb58bc486f975684f56d65615d

The root cause is,
1. vllm patch inductor_config.fx_graph_cache to True here:
   https://github.com/vllm-project/vllm/blob/e255d929902dcf8968541d2cbf0d18f0fe3f9c49/vllm/compilation/compiler_interface.py#L308

   The code in vllm relies fx graph cache is on (unless
   VLLM_DISABLE_COMPILE_CACHE is overriden to false)
2. setting TORCHINDUCTOR_FX_GRAPH_CACHE=0 will cause
   inductor_config.fx_graph_cache not overridable.

I add TORCHINDUCTOR_FX_GRAPH_CACHE_DEFAULT so that we can still use it to skip fx
graph cache while still allow project like vllm to override it.

ghstack-source-id: 836f92f
Pull Request resolved: #166138
@BoyuanFeng
Copy link
Contributor

In addition to 'VLLM_DISABLE_COMPILE_CACHE=0', we can also remove cached code under '.cache/vllm/'

@shunting314
Copy link
Contributor Author

@BoyuanFeng that triggers compilation (rather than load from saved artifacts), but this line will still fail since the compiled result is not saved to cache: https://github.com/vllm-project/vllm/blob/e255d929902dcf8968541d2cbf0d18f0fe3f9c49/vllm/compilation/compiler_interface.py#L476

@shunting314
Copy link
Contributor Author

The setting like TORCHINDUCTOR_FX_GRAPH_CACHE=0 causes the following behavior:

inductor_config.fx_graph_cache = True
print(inductor_config.fx_graph_cache)  # get False since TORCHINDUCTOR_FX_GRAPH_CACHE is assigned as env_name_force field

The new envvar will avoid that and is safe to put in .bashrc

@shunting314 shunting314 added the topic: not user facing topic category label Oct 24, 2025
@shunting314
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@cyyever
Copy link
Collaborator

cyyever commented Oct 26, 2025

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@cyyever
Copy link
Collaborator

cyyever commented Oct 26, 2025

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@shunting314
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@shunting314
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@shunting314
Copy link
Contributor Author

@pytorchbot merge -f 'force merge since it fails for so many times..."

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 28, 2025

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

@shunting314
Copy link
Contributor Author

@pytorchbot merge -f 'force merge since it fails for so many times...'

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/shunting314/245/head branch November 28, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants