Commit bd39def
committed
Update on "Improved perfs for vectorized bilinear interpolate cpu uint8 RGB-case (channels last)"
## Description
- Based on #96651
- Improved perfs for vectorized **bilinear** interpolate uint8 RGB-case, **channels last**
- unified RGB and RGBA processing code such that RGB input is not copied into RGBA
- Performances are more close to Pillow-SIMD (labeled as `Pillow (9.0.0.post1)` in the results)
- RGBA case perfs are the same after refactoring (see Source link below)
- Fixed mem pointer alignment, added more comments (reviews from #96651)
## Results
- `Pillow (9.0.0.post1)` == Pillow-SIMD
```
[-------------------------------------------------------------------------------------------------- Resize -------------------------------------------------------------------------------------------------]
| Pillow (9.0.0.post1) | torch (2.1.0a0+gitce4be01) PR | torch (2.1.0a0+git5309c44) nightly | Speed-up: PR vs nightly
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 torch.uint8 channels_last bilinear (256, 256) -> (32, 32) aa=True | 38.548 (+-0.280) | 57.536 (+-0.210) | 132.147 (+-1.236) | 2.297 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (32, 32) aa=False | | 38.532 (+-0.219) | 111.789 (+-1.175) | 2.901 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (224, 224) aa=True | 127.689 (+-1.348) | 156.262 (+-1.213) | 302.518 (+-2.632) | 1.936 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (224, 224) aa=False | | 145.483 (+-1.077) | 286.663 (+-2.494) | 1.970 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (320, 320) aa=True | 178.117 (+-1.956) | 215.053 (+-1.470) | 439.375 (+-4.014) | 2.043 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (320, 320) aa=False | | 211.340 (+-2.239) | 438.537 (+-4.143) | 2.075 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (32, 32) aa=True | 112.593 (+-1.266) | 130.414 (+-1.633) | 446.804 (+-3.283) | 3.426 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (32, 32) aa=False | | 58.767 (+-0.203) | 374.244 (+-13.598) | 6.368 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (224, 224) aa=True | 283.210 (+-2.937) | 324.157 (+-1.895) | 720.197 (+-3.467) | 2.222 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (224, 224) aa=False | | 239.800 (+-2.492) | 592.834 (+-3.903) | 2.472 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (32, 32) aa=True | 186.255 (+-1.629) | 204.834 (+-1.496) | 787.868 (+-3.648) | 3.846 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (32, 32) aa=False | | 77.335 (+-0.341) | 651.016 (+-3.926) | 8.418 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (224, 224) aa=True | 410.286 (+-2.439) | 443.934 (+-2.899) | 1123.923 (+-14.988) | 2.532 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (224, 224) aa=False | | 312.220 (+-2.307) | 915.347 (+-4.486) | 2.932 (+-0.000)
# More test-cases from #90771
3 torch.uint8 channels_last bilinear (64, 64) -> (224, 224) aa=True | 60.611 (+-0.337) | 80.849 (+-1.780) | 170.465 (+-1.830) | 2.108 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (270, 268) aa=True | 132.971 (+-1.624) | 164.892 (+-1.426) | 330.971 (+-3.249) | 2.007 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (1024, 1024) aa=True | 948.467 (+-3.179) | 891.414 (+-5.282) | 2805.510 (+-25.503) | 3.147 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (64, 64) aa=True | 52.539 (+-0.327) | 72.471 (+-0.367) | 135.933 (+-1.625) | 1.876 (+-0.000)
3 torch.uint8 channels_last bilinear (270, 268) -> (224, 224) aa=True | 138.669 (+-1.867) | 168.628 (+-1.213) | 321.112 (+-2.904) | 1.904 (+-0.000)
3 torch.uint8 channels_last bilinear (1024, 1024) -> (256, 256) aa=True | 689.933 (+-3.175) | 746.911 (+-2.985) | 2050.880 (+-22.188) | 2.746 (+-0.000)
3 torch.uint8 channels_last bilinear (64, 64) -> (224, 224) aa=False | | 78.347 (+-0.338) | 169.646 (+-1.640) | 2.165 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (270, 268) aa=False | | 162.194 (+-1.089) | 329.754 (+-2.590) | 2.033 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (1024, 1024) aa=False | | 894.476 (+-2.738) | 2815.870 (+-22.589) | 3.148 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (64, 64) aa=False | | 52.728 (+-0.406) | 112.024 (+-1.225) | 2.125 (+-0.000)
3 torch.uint8 channels_last bilinear (270, 268) -> (224, 224) aa=False | | 151.560 (+-1.128) | 299.152 (+-3.353) | 1.974 (+-0.000)
3 torch.uint8 channels_last bilinear (1024, 1024) -> (256, 256) aa=False | | 500.053 (+-4.288) | 1698.601 (+-16.785) | 3.397 (+-0.000)
```
Note: There is no perf regression for other case. There some cases (see Source below) with small speed-ups, for the rest it is roughly around 1.0 +/- 0.1 which may be attributed to noisy measurements ...
[Source](https://gist.github.com/vfdev-5/1c0778904a07ce40401306548b9525e8#file-20230322-132441-pr_vs_nightly-speedup-md)
## Context
- #90771
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10
[ghstack-poisoned]1 file changed
+21
-22
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | 63 | | |
67 | 64 | | |
68 | | - | |
69 | | - | |
| 65 | + | |
| 66 | + | |
70 | 67 | | |
71 | 68 | | |
72 | 69 | | |
| |||
92 | 89 | | |
93 | 90 | | |
94 | 91 | | |
95 | | - | |
| 92 | + | |
96 | 93 | | |
97 | 94 | | |
98 | 95 | | |
| |||
326 | 323 | | |
327 | 324 | | |
328 | 325 | | |
329 | | - | |
| 326 | + | |
330 | 327 | | |
331 | 328 | | |
332 | 329 | | |
333 | | - | |
| 330 | + | |
334 | 331 | | |
335 | 332 | | |
336 | 333 | | |
| |||
346 | 343 | | |
347 | 344 | | |
348 | 345 | | |
349 | | - | |
| 346 | + | |
350 | 347 | | |
351 | 348 | | |
352 | 349 | | |
| |||
361 | 358 | | |
362 | 359 | | |
363 | 360 | | |
364 | | - | |
365 | | - | |
| 361 | + | |
| 362 | + | |
366 | 363 | | |
367 | 364 | | |
368 | | - | |
369 | | - | |
| 365 | + | |
| 366 | + | |
370 | 367 | | |
371 | 368 | | |
372 | 369 | | |
373 | | - | |
374 | | - | |
| 370 | + | |
| 371 | + | |
375 | 372 | | |
376 | 373 | | |
377 | 374 | | |
378 | 375 | | |
379 | | - | |
| 376 | + | |
380 | 377 | | |
381 | 378 | | |
382 | 379 | | |
383 | | - | |
| 380 | + | |
384 | 381 | | |
385 | | - | |
| 382 | + | |
386 | 383 | | |
387 | 384 | | |
388 | 385 | | |
| |||
400 | 397 | | |
401 | 398 | | |
402 | 399 | | |
403 | | - | |
| 400 | + | |
404 | 401 | | |
405 | 402 | | |
406 | 403 | | |
| |||
413 | 410 | | |
414 | 411 | | |
415 | 412 | | |
416 | | - | |
| 413 | + | |
417 | 414 | | |
418 | 415 | | |
419 | 416 | | |
| |||
441 | 438 | | |
442 | 439 | | |
443 | 440 | | |
444 | | - | |
| 441 | + | |
| 442 | + | |
445 | 443 | | |
446 | 444 | | |
447 | 445 | | |
| |||
740 | 738 | | |
741 | 739 | | |
742 | 740 | | |
743 | | - | |
| 741 | + | |
| 742 | + | |
744 | 743 | | |
745 | 744 | | |
746 | 745 | | |
| |||
0 commit comments