Commit 75c190b
committed
Update base for Update on "Improved perfs for vectorized bilinear interpolate cpu uint8 RGB-case (channels last)"
## Description
- Based on #96651
- Improved perfs for vectorized **bilinear** interpolate uint8 RGB-case, **channels last**
- unified RGB and RGBA processing code such that RGB input is not copied into RGBA
- Performances are more close to Pillow-SIMD (labeled as `Pillow (9.0.0.post1)` in the results)
- RGBA case perfs are the same after refactoring (see Source link below)
- Fixed mem pointer alignment, added more comments (reviews from #96651)
## Results
- `Pillow (9.0.0.post1)` == Pillow-SIMD
```
[-------------------------------------------------------------------------------------------------- Resize -------------------------------------------------------------------------------------------------]
| Pillow (9.0.0.post1) | torch (2.1.0a0+gitd6e220c) PR | torch (2.1.0a0+git2b75955) nightly | Speed-up: PR vs nightly
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 torch.uint8 channels_last bilinear (256, 256) -> (32, 32) aa=True | 38.674 (+-0.323) | 57.591 (+-0.244) | 131.033 (+-1.448) | 2.275 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (32, 32) aa=False | | 39.471 (+-0.166) | 113.911 (+-1.736) | 2.886 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (224, 224) aa=True | 128.512 (+-1.916) | 161.592 (+-1.242) | 299.679 (+-2.099) | 1.855 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (224, 224) aa=False | | 150.994 (+-1.180) | 285.331 (+-1.919) | 1.890 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (320, 320) aa=True | 180.045 (+-2.223) | 220.581 (+-1.363) | 431.057 (+-3.536) | 1.954 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (320, 320) aa=False | | 219.391 (+-1.409) | 429.410 (+-3.620) | 1.957 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (32, 32) aa=True | 113.911 (+-1.024) | 129.457 (+-1.295) | 459.610 (+-13.322) | 3.550 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (32, 32) aa=False | | 59.800 (+-0.199) | 400.015 (+-11.815) | 6.689 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (224, 224) aa=True | 283.050 (+-2.664) | 339.143 (+-1.209) | 683.555 (+-4.466) | 2.016 (+-0.000)
3 torch.uint8 channels_last bilinear (520, 520) -> (224, 224) aa=False | | 250.601 (+-1.236) | 603.545 (+-2.644) | 2.408 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (32, 32) aa=True | 186.723 (+-2.213) | 199.960 (+-1.343) | 860.867 (+-21.763) | 4.305 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (32, 32) aa=False | | 79.188 (+-0.261) | 703.019 (+-25.805) | 8.878 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (224, 224) aa=True | 412.353 (+-4.476) | 462.230 (+-1.983) | 1101.673 (+-49.299) | 2.383 (+-0.000)
3 torch.uint8 channels_last bilinear (712, 712) -> (224, 224) aa=False | | 327.973 (+-1.852) | 941.062 (+-5.549) | 2.869 (+-0.000)
3 torch.uint8 channels_last bilinear (64, 64) -> (224, 224) aa=True | 61.191 (+-0.926) | 80.795 (+-0.518) | 160.853 (+-1.506) | 1.991 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (270, 268) aa=True | 134.488 (+-2.129) | 169.147 (+-1.324) | 327.343 (+-2.846) | 1.935 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (1024, 1024) aa=True | 1037.045 (+-24.982) | 938.623 (+-9.010) | 2603.360 (+-20.530) | 2.774 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (64, 64) aa=True | 52.792 (+-0.613) | 73.692 (+-0.264) | 131.829 (+-1.333) | 1.789 (+-0.000)
3 torch.uint8 channels_last bilinear (270, 268) -> (224, 224) aa=True | 139.596 (+-1.944) | 173.778 (+-1.039) | 320.063 (+-2.562) | 1.842 (+-0.000)
3 torch.uint8 channels_last bilinear (1024, 1024) -> (256, 256) aa=True | 690.132 (+-10.946) | 772.758 (+-2.864) | 2036.860 (+-36.109) | 2.636 (+-0.000)
3 torch.uint8 channels_last bilinear (64, 64) -> (224, 224) aa=False | | 78.747 (+-0.799) | 158.479 (+-1.702) | 2.013 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (270, 268) aa=False | | 167.046 (+-1.077) | 322.104 (+-2.764) | 1.928 (+-0.000)
3 torch.uint8 channels_last bilinear (256, 256) -> (1024, 1024) aa=False | | 918.967 (+-5.251) | 2611.388 (+-29.917) | 2.842 (+-0.000)
3 torch.uint8 channels_last bilinear (224, 224) -> (64, 64) aa=False | | 55.336 (+-0.251) | 113.869 (+-1.243) | 2.058 (+-0.000)
3 torch.uint8 channels_last bilinear (270, 268) -> (224, 224) aa=False | | 156.505 (+-1.095) | 299.861 (+-2.710) | 1.916 (+-0.000)
3 torch.uint8 channels_last bilinear (1024, 1024) -> (256, 256) aa=False | | 514.344 (+-1.905) | 1776.796 (+-19.660) | 3.454 (+-0.000)
```
Note: There is no perf regression for other case. There some cases (see Source below) with small speed-ups, for the rest it is roughly around 1.0 +/- 0.1 which may be attributed to noisy measurements ...
[Source](https://gist.github.com/vfdev-5/1c0778904a07ce40401306548b9525e8#file-20230329-181023-pr_vs_nightly-speedup-md)
## Context
- #90771
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 datumbox pmeier
[ghstack-poisoned]File tree
449 files changed
+36568
-7404
lines changed- .ci
- docker
- ci_commit_pins
- common
- pytorch
- .github
- ci_commit_pins
- requirements
- scripts
- workflows
- aten/src/ATen
- cpu/vec/vec512
- cuda
- detail
- functorch
- native
- cuda
- cudnn
- metal
- mpscnn
- ops
- mkldnn
- mps/operations
- nested
- sparse
- cuda
- transformers
- cuda
- vulkan
- glsl
- ops
- ops
- test
- benchmarks
- distributed/rpc/rl
- dynamo
- ci_expected_accuracy
- c10
- cuda
- test/util
- caffe2
- proto
- python
- utils
- cmake
- public
- docs/source
- functorch/experimental
- test
- cpp
- api
- c10d
- jit
- distributed
- _spmd
- checkpoint
- fsdp
- tensor/parallel
- dynamo
- expect
- export
- functorch
- inductor
- jit
- quantization
- fx
- jit
- third_party
- tools
- linter/adapters
- stats
- torchgen
- api
- torch
- _C
- _dispatch
- _dynamo
- variables
- _export
- _functorch
- _inductor
- codegen
- kernel
- _prims_common
- _refs
- _subclasses
- ao
- pruning/_experimental/data_sparsifier/lightning/tests
- quantization/fx
- autograd
- csrc
- autograd
- cuda
- distributed/c10d
- dynamo
- jit
- backends/coreml/objc
- codegen
- cuda
- onednn
- passes
- python
- runtime
- interpreter
- static
- serialization
- tensorexpr
- operators
- lazy/core
- profiler
- python
- unwind
- utils
- cuda
- distributed
- _composable
- _shard
- sharded_tensor
- sharding_plan
- _spmd
- _tensor
- ops
- _tools
- algorithms
- ddp_comm_hooks
- model_averaging
- checkpoint
- elastic
- events
- multiprocessing
- rendezvous
- timer
- examples
- fsdp
- launcher
- nn/api
- optim
- pipeline/sync
- rpc
- tensor/parallel
- distributions
- fx
- experimental
- passes
- dialect/common
- jit
- nn
- modules
- parallel
- utils
- onnx/_internal/fx/passes
- profiler
- testing
- _internal
- distributed
- opinfo
- definitions
- utils
- _sympy
- data
- _utils
- datapipes
- dataframe
- iter
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
449 files changed
+36568
-7404
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
| 60 | + | |
61 | 61 | | |
62 | | - | |
| 62 | + | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
| 65 | + | |
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | | - | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
17 | 22 | | |
18 | 23 | | |
19 | 24 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
264 | 264 | | |
265 | 265 | | |
266 | 266 | | |
267 | | - | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
268 | 270 | | |
269 | 271 | | |
270 | 272 | | |
| |||
288 | 290 | | |
289 | 291 | | |
290 | 292 | | |
291 | | - | |
292 | 293 | | |
293 | | - | |
294 | | - | |
295 | | - | |
296 | | - | |
297 | | - | |
298 | | - | |
299 | 294 | | |
300 | 295 | | |
301 | 296 | | |
| |||
306 | 301 | | |
307 | 302 | | |
308 | 303 | | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
309 | 307 | | |
310 | 308 | | |
311 | 309 | | |
312 | 310 | | |
313 | | - | |
| 311 | + | |
314 | 312 | | |
315 | 313 | | |
316 | 314 | | |
317 | 315 | | |
318 | 316 | | |
319 | 317 | | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
320 | 321 | | |
321 | 322 | | |
322 | 323 | | |
| |||
592 | 593 | | |
593 | 594 | | |
594 | 595 | | |
595 | | - | |
596 | | - | |
597 | | - | |
| 596 | + | |
598 | 597 | | |
599 | 598 | | |
600 | 599 | | |
| |||
874 | 873 | | |
875 | 874 | | |
876 | 875 | | |
877 | | - | |
878 | | - | |
879 | | - | |
880 | | - | |
881 | | - | |
882 | | - | |
883 | | - | |
884 | | - | |
885 | 876 | | |
886 | 877 | | |
887 | 878 | | |
| |||
912 | 903 | | |
913 | 904 | | |
914 | 905 | | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
915 | 914 | | |
916 | 915 | | |
917 | 916 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
13 | | - | |
| 15 | + | |
14 | 16 | | |
15 | | - | |
| 17 | + | |
16 | 18 | | |
17 | 19 | | |
18 | 20 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
| 19 | + | |
18 | 20 | | |
19 | 21 | | |
20 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
88 | | - | |
| 88 | + | |
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
0 commit comments