Commit c543034
add cuda sync when ops running on gpu (#29936)
Summary:
Pull Request resolved: #29936
This diff adds synchronization after op execution to ensure all the cuda streams complete.
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 154.412
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 101.115
...
Reviewed By: hl475
Differential Revision: D18542732
fbshipit-source-id: b979d26a174f488e971074dc1e16b00e17179c801 parent f1860ae commit c543034
File tree
2 files changed
+9
-4
lines changed- benchmarks/operator_benchmark
2 files changed
+9
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
249 | 249 | | |
250 | 250 | | |
251 | 251 | | |
| 252 | + | |
252 | 253 | | |
253 | 254 | | |
254 | 255 | | |
255 | | - | |
| 256 | + | |
256 | 257 | | |
257 | 258 | | |
258 | 259 | | |
259 | 260 | | |
260 | 261 | | |
261 | 262 | | |
262 | | - | |
| 263 | + | |
263 | 264 | | |
264 | 265 | | |
265 | 266 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
130 | | - | |
| 130 | + | |
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
| |||
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
150 | | - | |
| 150 | + | |
151 | 151 | | |
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
156 | 156 | | |
| 157 | + | |
| 158 | + | |
157 | 159 | | |
158 | 160 | | |
159 | 161 | | |
160 | 162 | | |
161 | 163 | | |
| 164 | + | |
| 165 | + | |
162 | 166 | | |
163 | 167 | | |
164 | 168 | | |
| |||
0 commit comments