Skip to content

[Inductor-FX] Generalize FloorDiv conversion to handle more complex launch grids. Remove python_slow grid mode.#163828

Closed
blaine-rister wants to merge 8 commits intomainfrom
brister/fx_no_python_slow
Closed

[Inductor-FX] Generalize FloorDiv conversion to handle more complex launch grids. Remove python_slow grid mode.#163828
blaine-rister wants to merge 8 commits intomainfrom
brister/fx_no_python_slow

Conversation

@blaine-rister
Copy link
Contributor

@blaine-rister blaine-rister commented Sep 25, 2025

Problem

Inductor's FX backend receives sympy expressions for Triton launch grids, and passes these to a tracer to generate equivalent FX IR. However, the tracer does not support all possible sympy expressions. In particular, it can't handle ops like floor and Pow which would be found in an expression like floor(x / y). Instead, it expects FloorDiv(x, y), which has the advantage that all intermediate values are integers, unlike x / y.

Inductor's Python backend uses a trick where ceil(x / y) is computed in Python as -(x // -y), which is faster when evaluating Python launch grids at runtime. However, this trick generates more complex sympy expressions, so the FX backend introduced a "python_slow" mode using a more familiar form of ceil division. However, this mode is slower to evaluate, which increased production CPU usage. (Internal reviewers see T237853632.)

Solution

To get the best of both worlds, this PR removes "python_slow" mode, and generalizes the replace_floor_div function to handle the more complex expressions resulting from the "python" grid mode. The new algorithm is conceptually similar to the existing one, except instead of analyzing only the first argument to a sympy.Mul op, it checks all factors, so it can handle expressions containing both Rational and Pow ops, among other cases. It also uses Mul.make_args to handle the case when the argument to floor is not a Mul. Finally, it uses expr.is_positive to check the sign of symbolic exponents.

This new algorithm is guaranteed to convert all floor ops to an equivalent expression using FloorDiv. (To see this, consider that floor(x) == FloorDiv(x, 1).) Note it may not remove all Pow ops, with a counterexample being floor(x / (2 + z ** y)), but it covers everything we've seen in practice for symbolic launch grids. In particular, it covers the typical case where Pow is a factor of the argument to floor, and the exponent is -1. Is this situation, we move the Pow to the denominator of FloorDiv and the exponent becomes 1, eliminating the Pow op.

Test plan

This PR adds an end-to-end test for static padding with dynamic outer dimensions, which creates a difficult sympy expression that the existing algorithm would not be able to handle.

This PR also adds some unit tests for the replace_floor_div function. It can be difficult to construct end-to-end tests that expose all the trickiest expressions, as those tests have to pass through a number of other systems handling dynamic shapes. Therefore, it's easier to expose the edge cases with these new unit tests. The tests check that we can replace all floor ops in the input expression with FloorDiv, then they expand FloorDiv back to floor and check equality with the original expression.

Note this PR also requires some MTIA changes to pass internal tests. Those will be stacked onto the imported diff.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163828

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1175db1 with merge base 5f90e8c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

@blaine-rister has imported this pull request. If you are a Meta employee, you can view this in D83209451.

@facebook-github-bot
Copy link
Contributor

@blaine-rister has imported this pull request. If you are a Meta employee, you can view this in D83209451.

@facebook-github-bot
Copy link
Contributor

@blaine-rister has imported this pull request. If you are a Meta employee, you can view this in D83209451.

@facebook-github-bot
Copy link
Contributor

@blaine-rister has imported this pull request. If you are a Meta employee, you can view this in D83209451.

@facebook-github-bot
Copy link
Contributor

@blaine-rister has imported this pull request. If you are a Meta employee, you can view this in D83209451.

@blaine-rister blaine-rister changed the title [Inductor-FX] Remove python_slow grid mode [Inductor-FX] Generalize FloorDiv conversion to handle more complex launch grids. Remove python_slow grid mode. Sep 26, 2025
@blaine-rister blaine-rister marked this pull request as ready for review September 26, 2025 03:23
@facebook-github-bot
Copy link
Contributor

@blaine-rister has imported this pull request. If you are a Meta employee, you can view this in D83209451.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 29, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@yangw-dev
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the brister/fx_no_python_slow branch October 31, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor release notes: fx release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants