[MPS] Fix conv layout handling#162776
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162776
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ No Failures, 38 PendingAs of commit 337d8ee with merge base eb3fbf5 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot merge -f "Lint + MPS tests are green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
What started as simple fix for `mps_convolution_backward_input` resulted in a pretty significant refactor/fixes: - Updated `mps_conv_use_channels_last` to return channels last output if either input or weights are channels last - Use the same primitive throughout `Convolution.mm` to determine wether output should be allocated in channels last format or not But doing only those two, resulted in crash in `test_memory_format_nn_Conv2d_mps_float32`, when weights were backward, and bias is present: ``` % python -c "import torch;print(torch.nn.functional.conv2d(torch.rand(2, 4, 3, 4,device='mps'), torch.rand(5, 4, 3, 3,device='mps').to(memory_format=torch.channels_last), torch.rand(5,device='mps')))" /AppleInternal/Library/BuildRoots/4~B5E4ugDCh2RsPWAjMEoPu8LC5w1yXEwd7XweDhg/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:3619: failed assertion `Error: MLIR pass manager failed' zsh: abort python -c ``` Which requires a more thorough redesign/cleanup, namely: - Do not alter the layout based on MacOS version, but rather do additional copies on MacOS-14 if inputs/output or weight are in channels last format ( done by defining `std::optional<Tensor> output_c;` that contains a contiguous copy of the output tensor - Introduced `input_suggested_layout` which is set to ChannelsLast if and only if input is channels last and is running on MacOS-15+ - Delete unused `memory_layout` and `group` arguments from `fill_depthwise_conv_desc` - Fix bias broadcasting logic for channels last As result, in addition to adding one more regression test this change removes `expectedFailures` from: - `TestModule.test_memory_format` for `Conv2d`, `ConvTranspose2d`, `LazyConv1d`, `LazyConvTranspose1d` - `test_require_stride_expanded_dynamic_shapes` - `test_mutable_custom_op_fixed_layout2` for MacOS-14 Fixes #161905 Pull Request resolved: #162776 Approved by: https://github.com/Skylion007
Stack from ghstack (oldest at bottom):
What started as simple fix for
mps_convolution_backward_inputresulted in a pretty significant refactor/fixes:mps_conv_use_channels_lastto return channels last output if either input or weights are channels lastConvolution.mmto determine wether output should be allocated in channels last format or notBut doing only those two, resulted in crash in
test_memory_format_nn_Conv2d_mps_float32, when weights were backward, and bias is present:Which requires a more thorough redesign/cleanup, namely:
std::optional<Tensor> output_c;that contains a contiguous copy of the output tensorinput_suggested_layoutwhich is set to ChannelsLast if and only if input is channels last and is running on MacOS-15+memory_layoutandgrouparguments fromfill_depthwise_conv_descAs result, in addition to adding one more regression test this change removes
expectedFailuresfrom:TestModule.test_memory_formatforConv2d,ConvTranspose2d,LazyConv1d,LazyConvTranspose1dtest_require_stride_expanded_dynamic_shapestest_mutable_custom_op_fixed_layout2for MacOS-14Fixes #161905
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben