-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[pyper] to + lengths_to_offsets #73879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 1ce8e40 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
This pull request was exported from Phabricator. Differential Revision: D34696255 |
Summary: Pull Request resolved: pytorch#73879 Fuse the following pattern: ``` %1994 : Tensor = aten::to(%getattr_78.1, %188, %189, %189) # <eval_with_key>.50:11:0 %1995 : Tensor = fb::lengths_to_offsets(%1994, %190) # /mnt/xarfuse/uid-1994 ``` This pattern is applied after all the applicable clip_ranges+gather_ranges patterns Additional context in https://fb.quip.com/DSCbAozMBwUi Test Plan: > ./caffe2/caffe2/fb/predictor/scripts/run_disagg_model_benchmarks.sh 321004917 27 /data/users/ansha/tmp/ads_tail sr_only ~0.007ms overall reduction in tail model runtime (321004917_27 oemae_long_attr_win_2d_7d_aux_model) **Local (25 fused nodes)** Before: 2.04ms/iter 0.0112739 ms. 0.543996%. fb::lengths_to_offsets (31 nodes, out variant) 0.00805597 ms. 0.388722%. static_runtime::to_maybe_copy_out (30 nodes, out variant) After: 1.96256ms/iter 0.0100853 ms. 0.498655%. fb::to_lengths_to_offsets (25 nodes, out variant) 0.00328385 ms. 0.157536%. fb::lengths_to_offsets (6 nodes, out variant) 0.00239722 ms. 0.115002%. static_runtime::to_maybe_copy_out (5 nodes, out variant) **Local_RO (43 fused nodes)** Before: 0.11427 0.0110696 ms. 9.42255%. fb::lengths_to_offsets (43 nodes, out variant) 0.00638323 ms. 5.43349%. static_runtime::to_maybe_copy_out (43 nodes, out variant) After: 0.112098ms/iter 0.014206 ms. 12.6795%. fb::to_lengths_to_offsets (43 nodes, out variant) **Remote_RO (17 fused nodes)** Before: 0.24 0.0534883 ms. 23.0586%. static_runtime::to_maybe_copy_out (136 nodes, out variant) 0.00216992 ms. 0.935446%. fb::lengths_to_offsets (17 nodes, out variant) After: 0.240225 0.0525392 ms. 23.2864%. static_runtime::to_maybe_copy_out (119 nodes, out variant) 0.00265347 ms. 1.17607%. fb::to_lengths_to_offsets (17 nodes, out variant) Remote_Other (3 fused nodes) Not much affect Differential Revision: D34696255 fbshipit-source-id: d9b07a9f7f3b3ec83584305295de5cdad538abc9
|
This pull request was exported from Phabricator. Differential Revision: D34696255 |
Summary: Pull Request resolved: #73879 Fuse the following pattern: ``` %1994 : Tensor = aten::to(%getattr_78.1, %188, %189, %189) # <eval_with_key>.50:11:0 %1995 : Tensor = fb::lengths_to_offsets(%1994, %190) # /mnt/xarfuse/uid-1994 ``` This pattern is applied after all the applicable clip_ranges+gather_ranges patterns Additional context in https://fb.quip.com/DSCbAozMBwUi Test Plan: > ./caffe2/caffe2/fb/predictor/scripts/run_disagg_model_benchmarks.sh 321004917 27 /data/users/ansha/tmp/ads_tail sr_only ~0.007ms overall reduction in tail model runtime (321004917_27 oemae_long_attr_win_2d_7d_aux_model) **Local (25 fused nodes)** Before: 2.04ms/iter 0.0112739 ms. 0.543996%. fb::lengths_to_offsets (31 nodes, out variant) 0.00805597 ms. 0.388722%. static_runtime::to_maybe_copy_out (30 nodes, out variant) After: 1.96256ms/iter 0.0100853 ms. 0.498655%. fb::to_lengths_to_offsets (25 nodes, out variant) 0.00328385 ms. 0.157536%. fb::lengths_to_offsets (6 nodes, out variant) 0.00239722 ms. 0.115002%. static_runtime::to_maybe_copy_out (5 nodes, out variant) **Local_RO (43 fused nodes)** Before: 0.11427 0.0110696 ms. 9.42255%. fb::lengths_to_offsets (43 nodes, out variant) 0.00638323 ms. 5.43349%. static_runtime::to_maybe_copy_out (43 nodes, out variant) After: 0.112098ms/iter 0.014206 ms. 12.6795%. fb::to_lengths_to_offsets (43 nodes, out variant) **Remote_RO (17 fused nodes)** Before: 0.24 0.0534883 ms. 23.0586%. static_runtime::to_maybe_copy_out (136 nodes, out variant) 0.00216992 ms. 0.935446%. fb::lengths_to_offsets (17 nodes, out variant) After: 0.240225 0.0525392 ms. 23.2864%. static_runtime::to_maybe_copy_out (119 nodes, out variant) 0.00265347 ms. 1.17607%. fb::to_lengths_to_offsets (17 nodes, out variant) Remote_Other (3 fused nodes) Not much affect Reviewed By: mikeiovine Differential Revision: D34696255 fbshipit-source-id: a0dc4a8ff8f25a825f6dc371ec5e4b3b09740c29
|
Hey @ajyu. |
Summary: Pull Request resolved: #73879 Fuse the following pattern: ``` %1994 : Tensor = aten::to(%getattr_78.1, %188, %189, %189) # <eval_with_key>.50:11:0 %1995 : Tensor = fb::lengths_to_offsets(%1994, %190) # /mnt/xarfuse/uid-1994 ``` This pattern is applied after all the applicable clip_ranges+gather_ranges patterns Additional context in https://fb.quip.com/DSCbAozMBwUi Test Plan: > ./caffe2/caffe2/fb/predictor/scripts/run_disagg_model_benchmarks.sh 321004917 27 /data/users/ansha/tmp/ads_tail sr_only ~0.007ms overall reduction in tail model runtime (321004917_27 oemae_long_attr_win_2d_7d_aux_model) **Local (25 fused nodes)** Before: 2.04ms/iter 0.0112739 ms. 0.543996%. fb::lengths_to_offsets (31 nodes, out variant) 0.00805597 ms. 0.388722%. static_runtime::to_maybe_copy_out (30 nodes, out variant) After: 1.96256ms/iter 0.0100853 ms. 0.498655%. fb::to_lengths_to_offsets (25 nodes, out variant) 0.00328385 ms. 0.157536%. fb::lengths_to_offsets (6 nodes, out variant) 0.00239722 ms. 0.115002%. static_runtime::to_maybe_copy_out (5 nodes, out variant) **Local_RO (43 fused nodes)** Before: 0.11427 0.0110696 ms. 9.42255%. fb::lengths_to_offsets (43 nodes, out variant) 0.00638323 ms. 5.43349%. static_runtime::to_maybe_copy_out (43 nodes, out variant) After: 0.112098ms/iter 0.014206 ms. 12.6795%. fb::to_lengths_to_offsets (43 nodes, out variant) **Remote_RO (17 fused nodes)** Before: 0.24 0.0534883 ms. 23.0586%. static_runtime::to_maybe_copy_out (136 nodes, out variant) 0.00216992 ms. 0.935446%. fb::lengths_to_offsets (17 nodes, out variant) After: 0.240225 0.0525392 ms. 23.2864%. static_runtime::to_maybe_copy_out (119 nodes, out variant) 0.00265347 ms. 1.17607%. fb::to_lengths_to_offsets (17 nodes, out variant) Remote_Other (3 fused nodes) Not much affect Reviewed By: mikeiovine Differential Revision: D34696255 fbshipit-source-id: a0dc4a8ff8f25a825f6dc371ec5e4b3b09740c29 (cherry picked from commit a49b482)
|
This pull request has been reverted by 91a72e9. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk). |
|
This pull request has been reverted by 91a72e9. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk). |
Summary:
Fuse the following pattern:
This pattern is applied after all the applicable clip_ranges+gather_ranges patterns
Additional context in https://fb.quip.com/DSCbAozMBwUi
Test Plan:
~0.007ms overall reduction in tail model runtime
(321004917_27 oemae_long_attr_win_2d_7d_aux_model)
Local (25 fused nodes)
Before: 2.04ms/iter
0.0112739 ms. 0.543996%. fb::lengths_to_offsets (31 nodes, out variant)
0.00805597 ms. 0.388722%. static_runtime::to_maybe_copy_out (30 nodes, out variant)
After: 1.96256ms/iter
0.0100853 ms. 0.498655%. fb::to_lengths_to_offsets (25 nodes, out variant)
0.00328385 ms. 0.157536%. fb::lengths_to_offsets (6 nodes, out variant)
0.00239722 ms. 0.115002%. static_runtime::to_maybe_copy_out (5 nodes, out variant)
Local_RO (43 fused nodes)
Before: 0.11427
0.0110696 ms. 9.42255%. fb::lengths_to_offsets (43 nodes, out variant)
0.00638323 ms. 5.43349%. static_runtime::to_maybe_copy_out (43 nodes, out variant)
After: 0.112098ms/iter
0.014206 ms. 12.6795%. fb::to_lengths_to_offsets (43 nodes, out variant)
Remote_RO (17 fused nodes)
Before: 0.24
0.0534883 ms. 23.0586%. static_runtime::to_maybe_copy_out (136 nodes, out variant)
0.00216992 ms. 0.935446%. fb::lengths_to_offsets (17 nodes, out variant)
After: 0.240225
0.0525392 ms. 23.2864%. static_runtime::to_maybe_copy_out (119 nodes, out variant)
0.00265347 ms. 1.17607%. fb::to_lengths_to_offsets (17 nodes, out variant)
Remote_Other (3 fused nodes)
Not much affect
Differential Revision: D34696255