-
Notifications
You must be signed in to change notification settings - Fork 26.3k
DNNL: enable conv3d #35662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNNL: enable conv3d #35662
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 91a0981 (more details on the Dr. CI page):
❄️ 1 failure tentatively classified as flakybut reruns have not yet been triggered to confirm:
|
|
@ngimel , @VitalyFedyunin , this PR is about enable DNNL 3d ops, including conv, pooling and batchnorm, For resnext3d-101, and test on real dataset UCF101(input size is 10x3x32x128x170), we can get ~13x performance improvement compare to native cpu path on skx-8180. You can see the details in resnext3d-101. Thanks! |
[ghstack-poisoned]
|
@VitalyFedyunin -- could you review this PR? |
[ghstack-poisoned]
|
@VitalyFedyunin, please help review this code, thanks! |
|
@lly-zero-one Could you comment on the perf side? |
|
We have few internal changes for the current Conv3d implementation for performance improvement, which will be upstreamed in next week. So I am wondering whether we could do a full performance benchmark. For 2d case, we found the mkldnn conv is 2x slower than the native implementation on a specific production model (I will file a repro). |
2x performance diff with native implementation is serious... In future if you have similar issues, you may also address this in the Teams channel, we will get hands on it asap. |
|
#35937 is for tracking the issue. |
VitalyFedyunin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent with the approach we use for operators naming, we always explicitly specify 1d,2d,3d operators and we are letting python nn module to dispatch to the proper one.
Using this approach you are not only will follow convention, but also avoid introducing back incompatible changes.
This comment applies to all PRs in stack.
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Summary: - Bump DNNL to 1.5 - Bug fixes and improvements in ideep - suppress g++ Wreorder warning - avoid rebuilding `libmkldnn.so` uxlfoundation/oneDNN#743 - enable conv3d (integration code was checked in by Xiaobing #35662) Pull Request resolved: #40088 Differential Revision: D22071530 Pulled By: albanD fbshipit-source-id: e7a53d7421e8a7a03e36a7dfb68edc565a2f00df
[ghstack-poisoned]
|
@ngimel, please help merge those PRs, thanks! |
torch/utils/mkldnn.py
Outdated
| self.bias = state[1].to_mkldnn() | ||
| self.training = state[2] | ||
|
|
||
| class MkldnnConv3d(torch.jit.ScriptModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I believe this is kind of a legacy API as we now compile nn Modules recursively. Cc @eellison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, correct, better to inherit from torch.nn.Module, you shouldn't need any other changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will change it, and also for other case at next step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison , if it is inherited from torch.nn.Module, there will has a problem for torch.jit.save method, because for a MKLDNN module, the parameters are MKLDNN tensor which are opaque tensors(do not have storage), we will first call .to_dense() at getstate to save this script module. I will changed until this problem can be sovled. thanks!
torch/utils/mkldnn.py
Outdated
|
|
||
| @torch.jit.script_method | ||
| def forward(self, x): | ||
| return torch.conv3d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would happen here if parameters are not supported by mkldnn (use_mkldnn would return false e.g. because of dilation, or because x is wrong type), but weight is already reordered?
Also, suppose mkldnn_convolution is indeed called from Convolution.cpp, what happens next? In mkldnn_convolution there's only
ideep::tensor mkldnn_output = _mkldnn_conv2d(
mkldnn_input,
mkldnn_weight,
mkldnn_bias,
padding,
stride,
dilation,
groups);
If it is able to handle conv3d, then at the very least it is confusingly named.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first question, DNNL also support dilation for convNd, I can enable it, so there only has one case not supported by DNNL: x is not a float tensor, but for this case, I think weight is also a float tensor, it will report an error to user when reorder the weight, because it need to call **.to_mkldnn()**first which will check the tensor's type.
For the second question, yes, the name is confused, I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, please enable dilated convolution then. I agree that for correct user inputs this situation should not happen, but even in case of incorrect user inputs (user had float weight when creating a module, but is sending double tensor, or cuda tensor, to forward) the error message should be clear and helpful, and I don't know what will happen in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DNNL dilation conv is enabled now. thanks!
Differential Revision: [D22102408](https://our.internmc.facebook.com/intern/diff/D22102408) [ghstack-poisoned]
| return (input.is_mkldnn()) || // input is mkldnn Tensor | ||
| (input.options().backend() == at::Backend::CPU && | ||
| input.scalar_type() == kFloat && // only on CPU Float Tensors | ||
| !is_dilated() && // doesn't support dilation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should you also remove is_dilated check from here if it's actually supported by mkldnn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has another PR to do it, see #40220.
| IntArrayRef dilation, | ||
| int64_t groups) { | ||
|
|
||
| auto stride_vec = expand_param_if_needed(stride, "stride", 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, why do you need to expand stride etc here, but don't need to do it for 2d conv? If it were not for these expansion calls, reorder_weight functions are exactly the same for 2d and 3d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we don't need expand them, they have been expanded at
pytorch/torch/nn/modules/conv.py
Lines 402 to 405 in 13bd599
| kernel_size = _pair(kernel_size) | |
| stride = _pair(stride) | |
| padding = _pair(padding) | |
| dilation = _pair(dilation) |
|
Currently if I send tensor of the wrong type (e.g. double) to mkldnn convolution, I get an error which is the wrong error type, thrown by TORCH_INTERNAL_ASSERT. Should be TORCH_CHECK instead. |
Differential Revision: [D22102408](https://our.internmc.facebook.com/intern/diff/D22102408) [ghstack-poisoned]
Changed to TORCH_CHECK now. |
Summary: - Bump DNNL to 1.5 - Bug fixes and improvements in ideep - suppress g++ Wreorder warning - avoid rebuilding `libmkldnn.so` uxlfoundation/oneDNN#743 - enable conv3d (integration code was checked in by Xiaobing pytorch#35662) Pull Request resolved: pytorch#40088 Differential Revision: D22071530 Pulled By: albanD fbshipit-source-id: e7a53d7421e8a7a03e36a7dfb68edc565a2f00df
|
@VitalyFedyunin merged this pull request in 6ba807c. |
|
hi @XiaobingSuper we had to revert this stack (see https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master logs), could you please create new PRs |
Stack from ghstack:
Differential Revision: D22102408