optimize masked_fill on CPU #11359

mingfeima · 2018-09-07T01:57:53Z

This PR parallels masked_fill on CPU, currently it runs in sequential on CPU.

the following script is used to benchmark and verify this PR. On Xeon skylake 8180 (2 sockets * 28 cores),
it runs 4.20 sec without the PR and 0.11 sec with the PR.

import torch
import random
from time import time

size = 10 * 1000 * 1000
count = 100

def test_masked_fill():
    dst = torch.randn(size)
    dst_ = dst.clone()
    mask = torch.rand(size).mul(2).floor().byte()
    val = random.random()

    tstart = time()
    for i in range(count):
        dst.masked_fill_(mask, val)
    tend = time()
    print("masked_fill_: %f" % (tend-tstart))

    for i in range(size):
        if mask[i]:
            if dst[i] != val:
                print("fail")
        else:
            if dst[i] != dst_[i]:
                print("fail1")
    print("test_masked_fill: PASS")

test_masked_fill()

aten/src/TH/generic/THTensorEvenMoreMath.cpp

+  }
+#else
+  serial_path = 1;
+#endif


mingfeima · 2018-09-07T05:02:35Z

some caffe2 ci failed with could not create cache path /usr/local/caffe2/lib/python2.7/dist-packages/caffe2/python/.pytest_cache/v/cache/lastfailed
some pytorch ci failed with test_all_reduce_product from test_distributed.py.
i can't reproduce the fail locally, can someone give me some guidance?

ssnl · 2018-09-07T06:08:19Z

@mingfeima ignore the circle ci ones. they are experimental.

facebook-github-bot

ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This PR parallels `masked_fill` on CPU, currently it runs in sequential on CPU. the following script is used to benchmark and verify this PR. On Xeon skylake 8180 (2 sockets * 28 cores), it runs `4.20` sec without the PR and `0.11` sec with the PR. ```python import torch import random from time import time size = 10 * 1000 * 1000 count = 100 def test_masked_fill(): dst = torch.randn(size) dst_ = dst.clone() mask = torch.rand(size).mul(2).floor().byte() val = random.random() tstart = time() for i in range(count): dst.masked_fill_(mask, val) tend = time() print("masked_fill_: %f" % (tend-tstart)) for i in range(size): if mask[i]: if dst[i] != val: print("fail") else: if dst[i] != dst_[i]: print("fail1") print("test_masked_fill: PASS") test_masked_fill() ``` Pull Request resolved: pytorch/pytorch#11359 Differential Revision: D9735578 Pulled By: ezyang fbshipit-source-id: d437ad7c6dace1910d0c18d6d9ede80efb44fae4

Summary: This PR parallels `masked_fill` on CPU, currently it runs in sequential on CPU. the following script is used to benchmark and verify this PR. On Xeon skylake 8180 (2 sockets * 28 cores), it runs `4.20` sec without the PR and `0.11` sec with the PR. ```python import torch import random from time import time size = 10 * 1000 * 1000 count = 100 def test_masked_fill(): dst = torch.randn(size) dst_ = dst.clone() mask = torch.rand(size).mul(2).floor().byte() val = random.random() tstart = time() for i in range(count): dst.masked_fill_(mask, val) tend = time() print("masked_fill_: %f" % (tend-tstart)) for i in range(size): if mask[i]: if dst[i] != val: print("fail") else: if dst[i] != dst_[i]: print("fail1") print("test_masked_fill: PASS") test_masked_fill() ``` Pull Request resolved: pytorch#11359 Differential Revision: D9735578 Pulled By: ezyang fbshipit-source-id: d437ad7c6dace1910d0c18d6d9ede80efb44fae4

optimize masked_fill on CPU

95df2f4

mingfeima requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners September 7, 2018 01:57

apaszke reviewed Sep 7, 2018

View reviewed changes

aten/src/TH/generic/THTensorEvenMoreMath.cpp

}

#else

serial_path = 1;

#endif

This comment was marked as off-topic.

Sign in to view

remove serial_path

abde655

ezyang approved these changes Sep 9, 2018

View reviewed changes

facebook-github-bot reviewed Sep 9, 2018

View reviewed changes

facebook-github-bot closed this in 1b94f5c Sep 9, 2018

fmassa mentioned this pull request Oct 4, 2018

Bug in masked_fill_ for non contiguous tensors #12230

Closed

ezyang added open source merged labels Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize masked_fill on CPU #11359

optimize masked_fill on CPU #11359

Uh oh!

mingfeima commented Sep 7, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

mingfeima commented Sep 7, 2018

Uh oh!

ssnl commented Sep 7, 2018

Uh oh!

facebook-github-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

optimize masked_fill on CPU #11359

optimize masked_fill on CPU #11359

Uh oh!

Conversation

mingfeima commented Sep 7, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

mingfeima commented Sep 7, 2018

Uh oh!

ssnl commented Sep 7, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants