Commit 76c08bc
committed
Update base for Update on "Improved perfs for vectorized bilinear interpolate cpu uint8 RGB-case (channels last)"
## Description
- Based on #96651
- Improved perfs for vectorized **bilinear** interpolate uint8 RGB-case, **channels last**
- unified RGB and RGBA processing code such that RGB input is not copied into RGBA
- Performances are more close to Pillow-SIMD (labeled as `Pillow (9.0.0.post1)` in the results)
- RGBA case perfs are the same after refactoring (see Source link below)
- Fixed mem pointer alignment, added more comments (reviews from #96651)
## Results
- `Pillow (9.0.0.post1)` == Pillow-SIMD
```
[-------------------------------------------------------------------------------------------------- Resize -------------------------------------------------------------------------------------------------]
| Pillow (9.0.0.post1) | torch (2.1.0a0+gitce4be01) PR | torch (2.1.0a0+git5309c44) nightly | Speed-up: PR vs nightly
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 torch.uint8 channels_last bilinear (256, 256) -> (32, 32) aa=True | 38.548 (+-0.280) | 57.536 (+-0.210) | 132.147 (+-1.236) | 2.297 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (32, 32) aa=False | | 38.532 (+-0.219) | 111.789 (+-1.175) | 2.901 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (224, 224) aa=True | 127.689 (+-1.348) | 156.262 (+-1.213) | 302.518 (+-2.632) | 1.936 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (224, 224) aa=False | | 145.483 (+-1.077) | 286.663 (+-2.494) | 1.970 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (320, 320) aa=True | 178.117 (+-1.956) | 215.053 (+-1.470) | 439.375 (+-4.014) | 2.043 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (320, 320) aa=False | | 211.340 (+-2.239) | 438.537 (+-4.143) | 2.075 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (32, 32) aa=True | 112.593 (+-1.266) | 130.414 (+-1.633) | 446.804 (+-3.283) | 3.426 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (32, 32) aa=False | | 58.767 (+-0.203) | 374.244 (+-13.598) | 6.368 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (224, 224) aa=True | 283.210 (+-2.937) | 324.157 (+-1.895) | 720.197 (+-3.467) | 2.222 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (224, 224) aa=False | | 239.800 (+-2.492) | 592.834 (+-3.903) | 2.472 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (32, 32) aa=True | 186.255 (+-1.629) | 204.834 (+-1.496) | 787.868 (+-3.648) | 3.846 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (32, 32) aa=False | | 77.335 (+-0.341) | 651.016 (+-3.926) | 8.418 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (224, 224) aa=True | 410.286 (+-2.439) | 443.934 (+-2.899) | 1123.923 (+-14.988) | 2.532 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (224, 224) aa=False | | 312.220 (+-2.307) | 915.347 (+-4.486) | 2.932 (+-0.000)
# More test-cases from #90771
3 torch.uint8 channels_last bilinear (64, 64) -> (224, 224) aa=True | 60.611 (+-0.337) | 80.849 (+-1.780) | 170.465 (+-1.830) | 2.108 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (270, 268) aa=True | 132.971 (+-1.624) | 164.892 (+-1.426) | 330.971 (+-3.249) | 2.007 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (1024, 1024) aa=True | 948.467 (+-3.179) | 891.414 (+-5.282) | 2805.510 (+-25.503) | 3.147 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (64, 64) aa=True | 52.539 (+-0.327) | 72.471 (+-0.367) | 135.933 (+-1.625) | 1.876 (+-0.000)
3 torch.uint8 channels_last bilinear (270, 268) -> (224, 224) aa=True | 138.669 (+-1.867) | 168.628 (+-1.213) | 321.112 (+-2.904) | 1.904 (+-0.000)
3 torch.uint8 channels_last bilinear (1024, 1024) -> (256, 256) aa=True | 689.933 (+-3.175) | 746.911 (+-2.985) | 2050.880 (+-22.188) | 2.746 (+-0.000)
3 torch.uint8 channels_last bilinear (64, 64) -> (224, 224) aa=False | | 78.347 (+-0.338) | 169.646 (+-1.640) | 2.165 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (270, 268) aa=False | | 162.194 (+-1.089) | 329.754 (+-2.590) | 2.033 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (1024, 1024) aa=False | | 894.476 (+-2.738) | 2815.870 (+-22.589) | 3.148 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (64, 64) aa=False | | 52.728 (+-0.406) | 112.024 (+-1.225) | 2.125 (+-0.000)
3 torch.uint8 channels_last bilinear (270, 268) -> (224, 224) aa=False | | 151.560 (+-1.128) | 299.152 (+-3.353) | 1.974 (+-0.000)
3 torch.uint8 channels_last bilinear (1024, 1024) -> (256, 256) aa=False | | 500.053 (+-4.288) | 1698.601 (+-16.785) | 3.397 (+-0.000)
```
Note: There is no perf regression for other case. There some cases (see Source below) with small speed-ups, for the rest it is roughly around 1.0 +/- 0.1 which may be attributed to noisy measurements ...
[Source](https://gist.github.com/vfdev-5/1c0778904a07ce40401306548b9525e8#file-20230322-132441-pr_vs_nightly-speedup-md)
## Context
- #90771
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10
[ghstack-poisoned]1 parent 036ed6c commit 76c08bc
1 file changed
+19
-17
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | 42 | | |
46 | 43 | | |
47 | | - | |
48 | | - | |
| 44 | + | |
| 45 | + | |
49 | 46 | | |
50 | 47 | | |
51 | 48 | | |
| |||
71 | 68 | | |
72 | 69 | | |
73 | 70 | | |
74 | | - | |
| 71 | + | |
75 | 72 | | |
76 | 73 | | |
77 | 74 | | |
| |||
315 | 312 | | |
316 | 313 | | |
317 | 314 | | |
318 | | - | |
| 315 | + | |
319 | 316 | | |
320 | 317 | | |
321 | | - | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
322 | 322 | | |
323 | 323 | | |
324 | | - | |
| 324 | + | |
325 | 325 | | |
326 | 326 | | |
327 | 327 | | |
328 | | - | |
329 | | - | |
| 328 | + | |
| 329 | + | |
330 | 330 | | |
331 | 331 | | |
332 | 332 | | |
333 | 333 | | |
334 | | - | |
| 334 | + | |
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
338 | 338 | | |
339 | | - | |
| 339 | + | |
340 | 340 | | |
341 | 341 | | |
342 | 342 | | |
| |||
347 | 347 | | |
348 | 348 | | |
349 | 349 | | |
350 | | - | |
| 350 | + | |
351 | 351 | | |
352 | 352 | | |
353 | 353 | | |
| |||
359 | 359 | | |
360 | 360 | | |
361 | 361 | | |
362 | | - | |
| 362 | + | |
363 | 363 | | |
364 | 364 | | |
365 | 365 | | |
| |||
382 | 382 | | |
383 | 383 | | |
384 | 384 | | |
385 | | - | |
| 385 | + | |
| 386 | + | |
386 | 387 | | |
387 | 388 | | |
388 | 389 | | |
| |||
549 | 550 | | |
550 | 551 | | |
551 | 552 | | |
552 | | - | |
| 553 | + | |
| 554 | + | |
553 | 555 | | |
554 | 556 | | |
555 | 557 | | |
| |||
0 commit comments