Use stable topological sort in fuse_by_partitions#167397
Use stable topological sort in fuse_by_partitions#167397SherlockNoMad wants to merge 7 commits intogh/SherlockNoMad/18/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167397
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 2 Pending, 1 Unrelated FailureAs of commit 059d973 with merge base c6ae757 ( UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| erase_nodes(gm, sorted_nodes) | ||
|
|
||
| # topological sort original gm with newly created sub_gm | ||
| legalize_graph(gm) |
There was a problem hiding this comment.
legalize_graph performs a topo sort that shuffles the nodes is a global way, making the result unpredictable.
We should avoid this in graph pass in general.
There was a problem hiding this comment.
Another option here is to have a version of legalize graph that does a stable sort
|
I support the motivation for the change but I'm not an expert on the original pass here so I can't review. I do see CI failures as well. Do you want me to try to page this in? |
eellison
left a comment
There was a problem hiding this comment.
If you are doing manipulations by looking at DAG, you need to topo sort after, or do more work to keep it legalized while doing the matching. This is what we discussed in the meeting with @anijain2305.
Maybe stable_topological_sort will work better here ?
| output_node = sub_gm.graph.output_node() | ||
|
|
||
| next_node = module_node.next | ||
| with gm.graph.inserting_before(next_node): |
There was a problem hiding this comment.
OK, I sat down and understood the algorithm. The reason I think this doesn't work is because you can be in this situation:
input0 = ..
input1 = ...
output0 = ...
not_in_subgraph = f(output0)
input2 = ...
output1 = ...
Here, we DO need to do some code motion for the not_in_subgraph node, since it is "inside" the set of nodes we want to fuse and we need to push it to the end. But it's not necessarily obvious if we should push an arbitrary node in the middle of the fusion region to the end or the beginning. If instead it didn't depend on output0 but produced input2, then we would need to move it to the beginning. But I think you can make this work.
There was a problem hiding this comment.
indeed, the UT failure caught this case.
I changed the impl to use stable topo sort instead.
legalize_graph() performs a topo sort that shuffles the nodes is a global way, making the result unpredictable. We should avoid this in graph pass in general. This problem is discovered when testing regional_inductor, a single fuse region trigger the global reordering. Before https://www.internalfb.com/intern/diffing/?before_paste_number=2029217728&after_paste_number=2029218006®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff After https://www.internalfb.com/intern/diffing/?paste_number=2029162294®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff Left is gm before regional_inductor, right is after. cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m "seems to be breaking executorch signals internally, see D86780724" -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 7886070. Reverted #167397 on behalf of https://github.com/jeanschmidt due to seems to be breaking executorch signals internally, see D86780724 ([comment](#167397 (comment)))
legalize_graph() performs a topo sort that shuffles the nodes is a global way, making the result unpredictable. We should avoid this in graph pass in general. This problem is discovered when testing regional_inductor, a single fuse region trigger the global reordering. Before https://www.internalfb.com/intern/diffing/?before_paste_number=2029217728&after_paste_number=2029218006®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff After https://www.internalfb.com/intern/diffing/?paste_number=2029162294®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff Left is gm before regional_inductor, right is after. Pull Request resolved: pytorch#167397 Approved by: https://github.com/ezyang
…7397)" This reverts commit 7886070. Reverted pytorch#167397 on behalf of https://github.com/jeanschmidt due to seems to be breaking executorch signals internally, see D86780724 ([comment](pytorch#167397 (comment)))
legalize_graph() performs a topo sort that shuffles the nodes is a global way, making the result unpredictable. We should avoid this in graph pass in general. This problem is discovered when testing regional_inductor, a single fuse region trigger the global reordering. Before https://www.internalfb.com/intern/diffing/?before_paste_number=2029217728&after_paste_number=2029218006®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff After https://www.internalfb.com/intern/diffing/?paste_number=2029162294®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff Left is gm before regional_inductor, right is after. cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
legalize_graph() performs a topo sort that shuffles the nodes is a global way, making the result unpredictable. We should avoid this in graph pass in general. This problem is discovered when testing regional_inductor, a single fuse region trigger the global reordering. Before https://www.internalfb.com/intern/diffing/?before_paste_number=2029217728&after_paste_number=2029218006®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff After https://www.internalfb.com/intern/diffing/?paste_number=2029162294®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff Left is gm before regional_inductor, right is after. cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
|
@pytorchbot rebase |
legalize_graph() performs a topo sort that shuffles the nodes is a global way, making the result unpredictable. We should avoid this in graph pass in general. This problem is discovered when testing regional_inductor, a single fuse region trigger the global reordering. Before https://www.internalfb.com/intern/diffing/?before_paste_number=2029217728&after_paste_number=2029218006®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff After https://www.internalfb.com/intern/diffing/?paste_number=2029162294®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff Left is gm before regional_inductor, right is after. cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
legalize_graph() performs a topo sort that shuffles the nodes is a global way, making the result unpredictable. We should avoid this in graph pass in general. This problem is discovered when testing regional_inductor, a single fuse region trigger the global reordering. Before https://www.internalfb.com/intern/diffing/?before_paste_number=2029217728&after_paste_number=2029218006®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff After https://www.internalfb.com/intern/diffing/?paste_number=2029162294®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff Left is gm before regional_inductor, right is after. cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
legalize_graph() performs a topo sort that shuffles the nodes is a global way, making the result unpredictable. We should avoid this in graph pass in general. This problem is discovered when testing regional_inductor, a single fuse region trigger the global reordering. Before https://www.internalfb.com/intern/diffing/?before_paste_number=2029217728&after_paste_number=2029218006®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff After https://www.internalfb.com/intern/diffing/?paste_number=2029162294®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff Left is gm before regional_inductor, right is after. cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| "get_node_target", | ||
| "is_node_output_tensor", | ||
| "legalize_graph", | ||
| "stable_topological_sort", |
There was a problem hiding this comment.
New entries here are not ok. Please make sure to either properly document the function on the website or make it private by prepending an _
| "Tuple", | ||
| "compatibility", | ||
| "legalize_graph", | ||
| "stable_topological_sort", |
There was a problem hiding this comment.
New entries here are not ok either. Make the API properly public.
Pull Request resolved: #169498 Approved by: https://github.com/albanD ghstack dependencies: #167397
|
I would prefer we don't revert this, the forward fix for D88319332 is #169541 |
legalize_graph() performs a topo sort that shuffles the nodes is a global way, making the result unpredictable. We should avoid this in graph pass in general. This problem is discovered when testing regional_inductor, a single fuse region trigger the global reordering. Before https://www.internalfb.com/intern/diffing/?before_paste_number=2029217728&after_paste_number=2029218006®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff After https://www.internalfb.com/intern/diffing/?paste_number=2029162294®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff Left is gm before regional_inductor, right is after. Pull Request resolved: #167397 Approved by: https://github.com/ezyang
Pull Request resolved: #169498 Approved by: https://github.com/albanD ghstack dependencies: #167397
Stack from ghstack (oldest at bottom):
legalize_graph() performs a topo sort that shuffles the nodes is a global way, making the result unpredictable.
We should avoid this in graph pass in general.
This problem is discovered when testing regional_inductor, a single fuse region trigger the global reordering.
Before
https://www.internalfb.com/intern/diffing/?before_paste_number=2029217728&after_paste_number=2029218006®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff
After
https://www.internalfb.com/intern/diffing/?paste_number=2029162294®ex_remove_pattern=&enable_regex_remove=0&strip_empty_lines=0&line_wrap=0&selected_tab=plain_diff
Left is gm before regional_inductor, right is after.
cc @ezyang @EikanWang @jgong5 @wenzhe-nrv