Commit b04f609
committed
[aten] Call fbgemm functions for embedding prepack/unpack
Pull Request resolved: #44845
fbgemm functions are vectorized and faster
```
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6473924484856786
Summary (total time 15.08s):
PASS: 7
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
Performance Before:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: qembeddingbag_byte_prepack
# Mode: Eager
# Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 68.727
# Benchmarking PyTorch: qembeddingbag_byte_prepack
# Mode: Eager
# Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 131.500
# Benchmarking PyTorch: qembeddingbag_byte_prepack
# Mode: Eager
# Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 248.190
# Benchmarking PyTorch: qembeddingbag_4bit_prepack
# Mode: Eager
# Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 172.742
# Benchmarking PyTorch: qembeddingbag_4bit_prepack
# Mode: Eager
# Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 333.008
# Benchmarking PyTorch: qembeddingbag_4bit_prepack
# Mode: Eager
# Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 652.423
# Benchmarking PyTorch: qembeddingbag_2bit_prepack
# Mode: Eager
# Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 167.282
# Benchmarking PyTorch: qembeddingbag_2bit_prepack
# Mode: Eager
# Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 398.901
# Benchmarking PyTorch: qembeddingbag_2bit_prepack
# Mode: Eager
# Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 785.254
# Benchmarking PyTorch: qembeddingbag_byte_unpack
# Mode: Eager
# Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 122.653
# Benchmarking PyTorch: qembeddingbag_byte_unpack
# Mode: Eager
# Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 230.617
# Benchmarking PyTorch: qembeddingbag_byte_unpack
# Mode: Eager
# Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 408.807
# Benchmarking PyTorch: qembeddingbag_4bit_unpack
# Mode: Eager
# Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 176.087
# Benchmarking PyTorch: qembeddingbag_4bit_unpack
# Mode: Eager
# Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 337.514
# Benchmarking PyTorch: qembeddingbag_4bit_unpack
# Mode: Eager
# Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 659.716
# Benchmarking PyTorch: qembeddingbag_2bit_unpack
# Mode: Eager
# Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 342.529
# Benchmarking PyTorch: qembeddingbag_2bit_unpack
# Mode: Eager
# Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 665.197
# Benchmarking PyTorch: qembeddingbag_2bit_unpack
# Mode: Eager
# Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 1307.923
```
Performance After:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: qembeddingbag_byte_prepack
# Mode: Eager
# Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 10.782
# Benchmarking PyTorch: qembeddingbag_byte_prepack
# Mode: Eager
# Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 17.443
# Benchmarking PyTorch: qembeddingbag_byte_prepack
# Mode: Eager
# Name: qembeddingbag_byte_prepack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 25.898
# Benchmarking PyTorch: qembeddingbag_4bit_prepack
# Mode: Eager
# Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 13.903
# Benchmarking PyTorch: qembeddingbag_4bit_prepack
# Mode: Eager
# Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 18.575
# Benchmarking PyTorch: qembeddingbag_4bit_prepack
# Mode: Eager
# Name: qembeddingbag_4bit_prepack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 30.650
# Benchmarking PyTorch: qembeddingbag_2bit_prepack
# Mode: Eager
# Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 14.158
# Benchmarking PyTorch: qembeddingbag_2bit_prepack
# Mode: Eager
# Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 19.818
# Benchmarking PyTorch: qembeddingbag_2bit_prepack
# Mode: Eager
# Name: qembeddingbag_2bit_prepack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 30.852
# Benchmarking PyTorch: qembeddingbag_byte_unpack
# Mode: Eager
# Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 47.596
# Benchmarking PyTorch: qembeddingbag_byte_unpack
# Mode: Eager
# Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 91.025
# Benchmarking PyTorch: qembeddingbag_byte_unpack
# Mode: Eager
# Name: qembeddingbag_byte_unpack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 131.425
# Benchmarking PyTorch: qembeddingbag_4bit_unpack
# Mode: Eager
# Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 12.637
# Benchmarking PyTorch: qembeddingbag_4bit_unpack
# Mode: Eager
# Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 20.856
# Benchmarking PyTorch: qembeddingbag_4bit_unpack
# Mode: Eager
# Name: qembeddingbag_4bit_unpack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 33.944
# Benchmarking PyTorch: qembeddingbag_2bit_unpack
# Mode: Eager
# Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim128
# Input: num_embeddings: 80, embedding_dim: 128
Forward Execution Time (us) : 21.181
# Benchmarking PyTorch: qembeddingbag_2bit_unpack
# Mode: Eager
# Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim256
# Input: num_embeddings: 80, embedding_dim: 256
Forward Execution Time (us) : 34.213
# Benchmarking PyTorch: qembeddingbag_2bit_unpack
# Mode: Eager
# Name: qembeddingbag_2bit_unpack_num_embeddings80_embedding_dim512
# Input: num_embeddings: 80, embedding_dim: 512
Forward Execution Time (us) : 59.622
```
ghstack-source-id: 112812505
Differential Revision: [D23675777](https://our.internmc.facebook.com/intern/diff/D23675777/)1 parent dc67b47 commit b04f609
File tree
2 files changed
+30
-4
lines changed- aten/src/ATen/native/quantized/cpu
2 files changed
+30
-4
lines changedLines changed: 15 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
107 | | - | |
108 | | - | |
109 | 107 | | |
110 | 108 | | |
111 | 109 | | |
| |||
114 | 112 | | |
115 | 113 | | |
116 | 114 | | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
117 | 121 | | |
118 | 122 | | |
119 | 123 | | |
| |||
134 | 138 | | |
135 | 139 | | |
136 | 140 | | |
| 141 | + | |
| 142 | + | |
137 | 143 | | |
138 | 144 | | |
139 | 145 | | |
| |||
175 | 181 | | |
176 | 182 | | |
177 | 183 | | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
178 | 189 | | |
179 | 190 | | |
180 | 191 | | |
| |||
226 | 237 | | |
227 | 238 | | |
228 | 239 | | |
| 240 | + | |
| 241 | + | |
229 | 242 | | |
230 | 243 | | |
231 | 244 | | |
| |||
Lines changed: 15 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
76 | 80 | | |
77 | 81 | | |
78 | 82 | | |
| |||
84 | 88 | | |
85 | 89 | | |
86 | 90 | | |
| 91 | + | |
87 | 92 | | |
88 | 93 | | |
89 | 94 | | |
90 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
91 | 98 | | |
92 | 99 | | |
93 | 100 | | |
94 | | - | |
| 101 | + | |
95 | 102 | | |
96 | 103 | | |
97 | 104 | | |
| |||
105 | 112 | | |
106 | 113 | | |
107 | 114 | | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
108 | 119 | | |
109 | 120 | | |
110 | 121 | | |
| |||
122 | 133 | | |
123 | 134 | | |
124 | 135 | | |
| 136 | + | |
| 137 | + | |
125 | 138 | | |
126 | 139 | | |
127 | 140 | | |
| |||
0 commit comments