Skip to content

Conversation

@mingzhe09088
Copy link
Contributor

Summary: This diff adds synchronization after op execution to ensure all the cuda streams complete.

Test Plan:

buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 154.412

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 101.115
...

Reviewed By: hl475

Differential Revision: D18542732

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18542732

Summary:
Pull Request resolved: #29936

This diff adds synchronization after op execution to ensure all the cuda streams complete.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 154.412

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 101.115
...

Reviewed By: hl475

Differential Revision: D18542732

fbshipit-source-id: b8c92c33472df7ec623ad7a81a27d416de5cbc0a
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18542732

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in c543034.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants