Optimize default CPU path of Convolution with MKLDNN #48885
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #35937
Summary
This PR aims to optimize the default CPU path of Convolution with MKLDNN kernel. Earlier, users found that MKLDNN Conv underperformed THNN Conv in some circumstances (Issue#35937, PR-40610, PR-46675), especially when kernel size is equal to one or kernel is significantly larger than the input, MKLDNN kernel could be 2x slower than THNN.
Now we've improved the heuristics of the Conv algorithm selection and cut the overhead in some kernels, achieving the same or better performance than THNN.
Benchmark
Unit Tests
Shapes in this script (https://gist.github.com/pinzhenx/8f62d5076bb04f0fd2108380b22dfbaa) are collected from issues and PRs mentioned above. All the problematic cases now have been fixed by this PR.
Model Tests
As for the models, we tested two variations of resnext and got a comparable result as before.
Config: Skylake 8180, batch=1, thread=1, jemalloc
@lly-zero-one @bertmaher @dzhulgakov @ngimel @CaoZhongZ @jgong5