Fix max_pool2d perf regression #41174
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The two pointer variables
ptr_top_diffandptr_top_maskwere introduced in #38953. Some end-to-end tests showed training performance regression due to this change. The performance is restored after removing the two pointer variables, and adding offset directly below in the indexing [ ] calculations.See PR change https://github.com/pytorch/pytorch/pull/38953/files#diff-8085d370f4e98295074a51b8a1f829e9R187-R188
pytorch/aten/src/ATen/native/cuda/DilatedMaxPool2d.cu
Lines 186 to 195 in e4a3c58