-
Notifications
You must be signed in to change notification settings - Fork 26.3k
parallel max and min for ATen on CPU #10343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @colesbury |
|
can someone take a look at the build fail: |
|
Feel free to ignore that one. Do you have some benchmarks on this? |
|
@ssnl sure, i wrote a small benchmark for max, the piece of code reduces from import torch
from time import time
N = 2000
T = 35820
warmups = 100
count = 200
a = torch.randn(N, T)
def test_max():
for i in range(warmups):
b, _ = a.max(dim=1)
tstart = time()
for i in range(count):
b, _ = a.max(dim=1)
tend = time()
print("max reduction : %f ms" % ((tend-tstart)/count*1000))
test_max()I brought this up because i have been optimizing OpenNMT-py, |
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice speed-ups. LGTM with a few code style comments.
I'm working on changing how reductions are implemented and unifying some of the CPU and CUDA code, but it'll probably take a while, so this speed-up is very welcome.
|
|
||
| template <> | ||
| bool _isnan(float val) { | ||
| return std::isnan(val); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| return std::isnan(val); | ||
| } | ||
|
|
||
| #define isnan_break(val) \ |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: optimize max and min reduction for ATen CPU path, current code path from TH module runs in sequential on CPU. Pull Request resolved: pytorch/pytorch#10343 Differential Revision: D9330799 Pulled By: ezyang fbshipit-source-id: 5b8271e0ca3e3e73f88a9075aa541c8756001b7c
optimize max and min reduction for ATen CPU path, current code path from TH module runs in sequential on CPU.