Speed up CPU threshold and relu implementation #13182

colesbury · 2018-10-26T21:08:38Z

The previous threshold implementation was not vectorized or parallelized.
This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms

CPU timings:
https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8

1 thread (before vs. after)
10240:  17.4 us vs. 6.9 µs per loop
102400: 141 us vs. 39.8 µs per loop

16 threads (before vs. after)
10240:  17.4 us vs. 6.7 µs per loop
102400: 141 us vs. 14.3 µs per loop

CUDA timings are not measurably different.

[1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions
https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782

The previous threshold implementation was not vectorized or parallelized. This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms CPU timings: https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8 1 thread (before vs. after) 10240: 17.4 us vs. 6.9 µs per loop 102400: 141 us vs. 39.8 µs per loop 16 threads (before vs. after) 10240: 17.4 us vs. 6.7 µs per loop 102400: 141 us vs. 14.3 µs per loop CUDA timings are not measurably different. [1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782

facebook-github-bot

colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

We have some internal code that calls relu() on LongTensors. I'm not sure we deliberately intended to support that, but it's easy enough to make it keep working.

facebook-github-bot

colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

aten/src/ATen/native/native_functions.yaml

+  variants: function
+
+- func: threshold_(Tensor self, Scalar threshold, Scalar value) -> Tensor
+  variants: function


aten/src/ATen/native/native_functions.yaml

+  variants: function
+
+- func: threshold_out(Tensor result, Tensor self, Scalar threshold, Scalar value) -> Tensor
+  variants: function


tools/autograd/derivatives.yaml

-  self: threshold_backward(grad, self, 0, 0)
+  self: threshold_backward(grad, self, 0)
+
+- name: relu_(Tensor self)


tools/autograd/derivatives.yaml


- name: threshold_forward(Tensor self, Scalar threshold, Scalar value)
-  self: threshold_backward(grad, self, threshold, value)
+- name: threshold(Tensor self, Scalar threshold, Scalar value)


facebook-github-bot

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@colesbury is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: ``` The previous threshold implementation was not vectorized or parallelized. This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms CPU timings: https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8 1 thread (before vs. after) 10240: 17.4 us vs. 6.9 µs per loop 102400: 141 us vs. 39.8 µs per loop 16 threads (before vs. after) 10240: 17.4 us vs. 6.7 µs per loop 102400: 141 us vs. 14.3 µs per loop CUDA timings are not measurably different. [1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782 ``` Pull Request resolved: pytorch/pytorch#13182 Reviewed By: soumith Differential Revision: D12825105 Pulled By: colesbury fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15

colesbury requested a review from gchanan October 26, 2018 21:25

colesbury added 2 commits October 26, 2018 16:45

Fix threshold gradient calculation

cb810e8

Remove Threshold functions from THCUNN.h header

4e80f4d

facebook-github-bot reviewed Oct 29, 2018

View reviewed changes

Make relu() work on integral tensors

00cbcaa

We have some internal code that calls relu() on LongTensors. I'm not sure we deliberately intended to support that, but it's easy enough to make it keep working.

facebook-github-bot reviewed Oct 30, 2018

View reviewed changes

gchanan approved these changes Oct 31, 2018

View reviewed changes

Changes from review

c749852

facebook-github-bot reviewed Nov 1, 2018

View reviewed changes

Merge branch 'master' into threshold

9749381

facebook-github-bot reviewed Nov 2, 2018

View reviewed changes

facebook-github-bot reviewed Nov 5, 2018

View reviewed changes

facebook-github-bot closed this in 98f5c00 Nov 5, 2018

ezyang added the merged label Jun 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up CPU threshold and relu implementation #13182

Speed up CPU threshold and relu implementation #13182

Uh oh!

colesbury commented Oct 26, 2018

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Speed up CPU threshold and relu implementation #13182

Speed up CPU threshold and relu implementation #13182

Uh oh!

Conversation

colesbury commented Oct 26, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants