Improved perfs for vectorized interpolate cpu uint8 RGB-case #96651

vfdev-5 · 2023-03-13T13:40:22Z

Closed in favor of this stack: #96848

Description

Improved perfs for vectorized interpolate uint8 RGB-case
- unified RGB and RGBA processing code such that RGB input is not copied into RGBA
Performances are more close to Pillow-SIMD
RGBA case perfs are the same after refactoring (see Source link below)

Results

[------------------------------------------------------------------------------------------ Resize -----------------------------------------------------------------------------------------]
                                                                 |  Pillow (9.0.0.post1)  |  torch (2.1.0a0+git1d3a939) PR  |  torch (2.1.0a0+git5309c44) nightly  |  Speed-up: PR vs nightly
1 threads: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=True     |          38.5          |                56.3             |                 132.5                |            2.4
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=False    |                        |                36.2             |                 110.6                |            3.1
      3 torch.uint8 channels_last bilinear 256 -> 224 aa=True    |         127.0          |               149.9             |                 292.2                |            1.9
      3 torch.uint8 channels_last bilinear 256 -> 224 aa=False   |                        |               134.2             |                 276.8                |            2.1
      3 torch.uint8 channels_last bilinear 256 -> 320 aa=True    |         178.1          |               200.3             |                 416.4                |            2.1
      3 torch.uint8 channels_last bilinear 256 -> 320 aa=False   |                        |               198.0             |                 414.4                |            2.1
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=True     |         112.9          |               129.3             |                 441.3                |            3.4
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=False    |                        |                54.9             |                 364.2                |            6.6
      3 torch.uint8 channels_last bilinear 520 -> 224 aa=True    |         282.7          |               324.8             |                 691.6                |            2.1
      3 torch.uint8 channels_last bilinear 520 -> 224 aa=False   |                        |               211.9             |                 583.1                |            2.8
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=True     |         185.9          |               201.1             |                 783.1                |            3.9
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=False    |                        |                72.1             |                 649.8                |            9.0
      3 torch.uint8 channels_last bilinear 712 -> 224 aa=True    |         408.7          |               436.7             |                1100.5                |            2.5
      3 torch.uint8 channels_last bilinear 712 -> 224 aa=False   |                        |               268.8             |                 906.6                |            3.4

Source

Context

Add uint8 support for interpolate for CPU images #90771

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

…element in the line except the last line

pytorch-bot · 2023-03-13T13:40:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96651

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 10 Failures

As of commit c7a7f13:

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base f418e1f:

👉 Rebase onto the `viable/strict` branch to avoid these failures

lintrunner / linux-job (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/native/cpu/UpSampleKernel.cpp

peterbell10 · 2023-03-13T14:37:10Z