Skip to content

[inductor] bugfix: keep WeakDeps (WAR deps) during fusion#162316

Closed
v0i0 wants to merge 12 commits intogh/v0i0/9/basefrom
gh/v0i0/9/head
Closed

[inductor] bugfix: keep WeakDeps (WAR deps) during fusion#162316
v0i0 wants to merge 12 commits intogh/v0i0/9/basefrom
gh/v0i0/9/head

Conversation

@v0i0
Copy link
Contributor

@v0i0 v0i0 commented Sep 6, 2025

Stack from ghstack (oldest at bottom):

fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162316

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 9da88e3 with merge base a89d5e9 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

v0i0 added a commit that referenced this pull request Sep 6, 2025
…EMOVE during fusion

ghstack-source-id: 7c5791f
Pull Request resolved: #162316
@v0i0
Copy link
Contributor Author

v0i0 commented Sep 6, 2025

fixes #159855 will try to understand the surrounding code a bit more to see if there is a better way

@v0i0 v0i0 requested a review from eellison September 8, 2025 16:39
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a repro ? For write after read, we should be checking that the indexing expressions are exactly equal:

# StarDep doesn't match MemoryDep, different indices don't match
# However, broadcasting sometimes strips dimensions, and if that's the case
# we still can match unmet dep
# if there's indirect indexing, don't match it
def fusable_read_and_write(self, read: Dep, write: MemoryDep) -> bool:
if isinstance(read, MemoryDep):
read_name = self.mutation_renames.get(read.name, read.name)
if (
read_name != write.name
or free_symbol_is_type(read.index, SymT.TMP)
or free_symbol_is_type(write.index, SymT.TMP)
):
return False
if config.loop_ordering_after_fusion and read.num_vars != write.num_vars:
# Need merge loops if we do loop ordering after fusion since
# we have not merged the loops yet when creating the scheduler
# nodes.
read = read.normalize()
write = write.normalize()
return (
read.index == write.index
and len(read.size) >= len(write.size)
and read.size[: len(write.size)] == write.size
)
elif isinstance(read, StarDep):
read_name = self.mutation_renames.get(read.name, read.name)
write_name = self.mutation_renames.get(write.name, write.name)
if (
read.mode == write.mode
and write.mode is not None
and read_name == write_name
):
return True
return False

Original pr here: #118210

@v0i0
Copy link
Contributor Author

v0i0 commented Sep 8, 2025

@eellison i have a repro, it is just a little complicated (not quite sure yet how to get a smaller repro, the obvious thing does not work). It seems to only trigger if node2 has MutationLayoutSHOULDREMOVE, which seems to hide the fact that the input buffer of node1 is the same as the output buffer of node2.

…youtSHOULDREMOVE during fusion"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
v0i0 added a commit that referenced this pull request Sep 8, 2025
…EMOVE during fusion

ghstack-source-id: ea3bff0
Pull Request resolved: #162316
…youtSHOULDREMOVE during fusion"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
v0i0 added a commit that referenced this pull request Sep 8, 2025
…EMOVE during fusion

ghstack-source-id: 01a8eb2
Pull Request resolved: #162316
@v0i0
Copy link
Contributor Author

v0i0 commented Sep 8, 2025

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Sep 8, 2025
…youtSHOULDREMOVE during fusion"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
v0i0 added a commit that referenced this pull request Sep 8, 2025
…EMOVE during fusion

ghstack-source-id: 0bbab8e
Pull Request resolved: #162316
@v0i0 v0i0 requested a review from eellison September 8, 2025 21:56
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice ! for posterity - mind updating the commit message with details abt bug/fix ? (and what test case exercises this with your changes)

@v0i0 v0i0 changed the title [inductor] bugfix: consider WAR dependency from MutatingLayoutSHOULDREMOVE during fusion [inductor] bugfix: keep WeakDeps (WAR deps) during fusion Sep 8, 2025
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

[ghstack-poisoned]
v0i0 added a commit that referenced this pull request Sep 8, 2025
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

ghstack-source-id: 0bbab8e
Pull Request resolved: #162316
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

[ghstack-poisoned]
v0i0 added a commit that referenced this pull request Sep 9, 2025
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

ghstack-source-id: effb654
Pull Request resolved: #162316
@v0i0
Copy link
Contributor Author

v0i0 commented Sep 9, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 9, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 5, 5, linux.g6.4xlarge.experimental.nvidia.gpu)

Details for Dev Infra team Raised by workflow job

@v0i0
Copy link
Contributor Author

v0i0 commented Sep 11, 2025

fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
v0i0 added a commit that referenced this pull request Sep 12, 2025
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

ghstack-source-id: 46130ea
Pull Request resolved: #162316
@v0i0
Copy link
Contributor Author

v0i0 commented Sep 12, 2025

fixed up the foreach heuristic a bit better, we now only treat them different in the sense that if the weekdep is fusable into more than one of the subnodes, we retain it. this feels like the right way to do it?

regression again: https://github.com/pytorch/pytorch/actions/runs/17667209063

fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
v0i0 added a commit that referenced this pull request Sep 12, 2025
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

ghstack-source-id: e3c79c0
Pull Request resolved: #162316
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
v0i0 added a commit that referenced this pull request Sep 17, 2025
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

ghstack-source-id: 078ec9f
Pull Request resolved: #162316
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
v0i0 added a commit that referenced this pull request Sep 17, 2025
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

ghstack-source-id: a9015a5
Pull Request resolved: #162316
@shunting314
Copy link
Contributor

I stated a perf test for h100 here. @v0i0 . Perf looks fine. Timm has perf regression probably because more models pass the accuracy test.

@v0i0
Copy link
Contributor Author

v0i0 commented Sep 19, 2025

i checked out what the biggest regression was according to the regression run, and it neither reproduced nor did tlparse diff shown a difference in inductor. as such i'll fix it up and it should be good to go

[ghstack-poisoned]
v0i0 added a commit that referenced this pull request Sep 19, 2025
fixes #159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in #118210.

ghstack-source-id: 1754708
Pull Request resolved: #162316
@shunting314 shunting314 self-requested a review September 19, 2025 18:16
@v0i0
Copy link
Contributor Author

v0i0 commented Sep 21, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…2316)

fixes pytorch#159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in pytorch#118210.

Pull Request resolved: pytorch#162316
Approved by: https://github.com/eellison, https://github.com/mlazos, https://github.com/shunting314
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…2316)

fixes pytorch#159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in pytorch#118210.

Pull Request resolved: pytorch#162316
Approved by: https://github.com/eellison, https://github.com/mlazos, https://github.com/shunting314
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…2316)

fixes pytorch#159855, was not triggered in other tests since it took
more than one round of fusion to get to the problematic code
which prunes WeakDeps. The WeakDeps are important to inhibit
fusion of kernels that read/write data into mutated buffers
with different indexing.

We modify the code to a) always prune before fusion, rather
than after, which improves its coverage and makes our basic
vertical fusion tests surface this issue as well and b)
check whether the weak dep is fusable before eliminating it
(which basically means checking that the producing code and
the consuming code are sufficiently compatible).

The tests that trigger this with change (a) is:
test_fusing_write_into_disjoint_read introduced in pytorch#118210.

Pull Request resolved: pytorch#162316
Approved by: https://github.com/eellison, https://github.com/mlazos, https://github.com/shunting314
@github-actions github-actions bot deleted the gh/v0i0/9/head branch October 22, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants