Skip to content

[inductor] fix issue for example value with unbacked strides#163660

Closed
sevenEng wants to merge 2 commits intopytorch:mainfrom
sevenEng:q1l1/fix_autotune_unbacked_strides_example
Closed

[inductor] fix issue for example value with unbacked strides#163660
sevenEng wants to merge 2 commits intopytorch:mainfrom
sevenEng:q1l1/fix_autotune_unbacked_strides_example

Conversation

@sevenEng
Copy link
Contributor

@sevenEng sevenEng commented Sep 23, 2025

Issue

During autotune, we're not applying size hints atomically for the example inputs used for benchmarking.

If there is unbacked symint showing up in inputs' strides, this might lead to CUDA IMA,

and this could be reproduced by the added unittest, with stride being [128 * u0, 128, 1] and unbacked fallback being 8192, after calling benchmark_example_value, we get back a tensor with stride as [8192, 128, 1] as opposed to [128 * 8192, 128, 1]

Fix

Using the atomic API when trying to apply size hints to input tensor' strides.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163660

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 0d8f1f5 with merge base 3a110c9 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

@sevenEng has imported this pull request. If you are a Meta employee, you can view this in D83066085.

@ColinPeppler
Copy link
Contributor

Dumping some thoughts here, since this would be used for torch.compile GEMM autotuning too:

  1. If we graph break, do the sub-graphs share a single SizeVarAllocator?
  2. If we recompile, do both graphs share a single SizeVarAllocator?

Why do I care about SizeVarAllocator so much? We use it to cache unbacked substitutions when calling atomically_apply_size_hint.

@functools.lru_cache # noqa: B019
def _sub_unbacked_exprs(self, expr: Expr) -> Expr:
# it's fine to cache this fn since self is a singleton
replacements = self._get_unbacked_replacements()
while True:
new_expr = expr.subs(replacements)
if new_expr == expr:
return new_expr
expr = sympy.factor(new_expr)
def atomically_apply_size_hint(
self, expr: Union[Expr, int], *, fallback: Optional[int] = None
) -> Union[Expr, int]:
if isinstance(expr, (int, sympy.Integer)):
return int(expr)
if has_free_unbacked_symbols(expr):
# Make sure to substitute with the factored version
# e.g. 10*(s0 + u0) instead of 10*s0 + 10*u0
expr = self._sub_unbacked_exprs(sympy.factor(expr))

I believe the answer to both (1) and (2) is Yes.

AFAIK everytime we run Inductor, we need to create a brand new GraphLowering.

graph = GraphLowering(

When we create GraphLowering we also create SizeVarAllocator.

self.sizevars = SizeVarAllocator(shape_env)

@sevenEng sevenEng force-pushed the q1l1/fix_autotune_unbacked_strides_example branch from 865aca9 to 5e0f4ac Compare September 25, 2025 18:36
V.graph.sizevars.size_hints(
node.get_stride(),
fallback=config.unbacked_symint_fallback,
hint_override=hint_override,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should definitely keep hint_override here.

We should either make atomically_apply_size_hint available/an option in size_hints or the other way around.

cc: @bobrenjc93 @pianpwk @ezyang for any thoughts

@sevenEng sevenEng force-pushed the q1l1/fix_autotune_unbacked_strides_example branch from 84b80e3 to 0d8f1f5 Compare October 14, 2025 16:46
@meta-codesync
Copy link

meta-codesync bot commented Oct 14, 2025

@sevenEng has imported this pull request. If you are a Meta employee, you can view this in D83066085.

@sevenEng
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 14, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@sevenEng sevenEng deleted the q1l1/fix_autotune_unbacked_strides_example branch October 14, 2025 20:26
zhudada0120 pushed a commit to zhudada0120/pytorch that referenced this pull request Oct 15, 2025
…#163660)

## Issue

During autotune, we're not applying size hints atomically for the example inputs used for benchmarking.

If there is unbacked symint showing up in inputs' strides, this might lead to CUDA IMA,

and this could be reproduced by the added unittest, with stride being `[128 * u0, 128, 1]` and unbacked fallback being 8192, after calling `benchmark_example_value`, we get back a tensor with stride as `[8192, 128, 1]` as opposed to `[128 * 8192, 128, 1]`

## Fix

Using the atomic API when trying to apply size hints to input tensor' strides.

Pull Request resolved: pytorch#163660
Approved by: https://github.com/ColinPeppler
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
…#163660)

## Issue

During autotune, we're not applying size hints atomically for the example inputs used for benchmarking.

If there is unbacked symint showing up in inputs' strides, this might lead to CUDA IMA,

and this could be reproduced by the added unittest, with stride being `[128 * u0, 128, 1]` and unbacked fallback being 8192, after calling `benchmark_example_value`, we get back a tensor with stride as `[8192, 128, 1]` as opposed to `[128 * 8192, 128, 1]`

## Fix

Using the atomic API when trying to apply size hints to input tensor' strides.

Pull Request resolved: pytorch#163660
Approved by: https://github.com/ColinPeppler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants