Batched CPU reductions #39512

emcastillo · 2020-06-04T09:37:33Z

In some operations, just doing a running reduction over a large array can cause precision issues leading to incorrect results. This happens in the outer reduction case, as the reduction is parallelized across columns and there are no partial results.

This PR batches these operations so we have several intermediate results that can lead to more stable results and fewer precision issues. The changes are minimal and I believe it is better to do it here than in the specific reduction routine such as sum.

TODO: Decide a correct granularity, the 256 right now was put there almost randomly
Measure performance.
Test.

This PR is for discussing this alternative mostly, please give your thoughts and I will work on it.

emcastillo · 2020-06-04T09:43:20Z

aten/src/ATen/native/cpu/Reduce.h

This needs to be discussed

dr-ci · 2020-06-04T09:56:32Z

💊 CI failures summary and remediations

As of commit 69d910d (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/caffe2-pytorch-linux-xenial-rocm3.3-py3.6-test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 28 times.

peterbell10 · 2020-06-10T13:28:42Z

Hey @emcastillo , I tried to include your version in the error comparison for #39516 but found your non-vectorized loop has exactly the same error as master. The basic_loop accumulates directly into the output, so you are sharing the same accumulator between batches and it performs the same additions as before.

emcastillo · 2020-06-12T01:48:04Z

Yeah, you are right, let me give it a twist

emcastillo · 2020-06-12T06:13:18Z

I added some code that removes the basic_loop and just adds a simplified version of reduction_128 but non vectorized. This also allows to get rid of the code adapting the parameters from the reduction to the Element-wise based loop that basic_loop is.

I still have to measure performance.

emcastillo · 2020-06-12T11:58:37Z

seems that some tests are affected, I will try to fix it

aten/src/ATen/native/cpu/Reduce.h

emcastillo · 2020-06-13T04:41:29Z

aten/src/ATen/native/cpu/Reduce.h

I don't feel happy with this change so I am open to suggestions.

peterbell10 · 2020-06-14T13:36:19Z

Here is the error comparison from your latest commit.

emcastillo · 2020-06-14T14:27:48Z

Thanks!
Tunning the block_size will have a great impact on the error measurements, I selected 256 without caring about it too much.
This PR is a general solution that might apply to other routines than sum, (although there doesn't seem to be other ones that could be greatly affected by this issue).

ngimel · 2020-10-21T16:09:38Z

#39516 achieving the same goal was merged, can we close this?

emcastillo · 2020-10-22T02:05:04Z

I am fine with closing it, I was just wondering that since this is a generic solution to the reduction algorithm, other reduction routines could benefit from this, but I can't think of other than sum/prod.

This was referenced Jun 4, 2020

Sums of expanded and repeated tensors are different #37234

Closed

torch.mean returns a wildly incorrect result 0.3277 on YCbCr version of CIFAR10 on CPU with dtype=float32 #38716

Closed

emcastillo force-pushed the batch-reduction branch from f659579 to 3dc6ce5 Compare June 4, 2020 09:43

emcastillo commented Jun 4, 2020

View reviewed changes

aten/src/ATen/native/cpu/Reduce.h Outdated

Copy link

Collaborator Author

emcastillo Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be discussed

pytorchbot added the open source label Jun 4, 2020

mruberry requested a review from VitalyFedyunin June 12, 2020 04:18

mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 12, 2020

emcastillo force-pushed the batch-reduction branch 2 times, most recently from 1cbcb40 to 483ed1c Compare June 12, 2020 07:00

emcastillo commented Jun 13, 2020

View reviewed changes

aten/src/ATen/native/cpu/Reduce.h Outdated Show resolved Hide resolved

emcastillo commented Jun 13, 2020

View reviewed changes

aten/src/ATen/native/cpu/Reduce.h Outdated

Copy link

Collaborator Author

emcastillo Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel happy with this change so I am open to suggestions.

Emilio Castillo added 3 commits June 13, 2020 07:05

Batched CPU reductions

f2e5ab5

Removed basic_loop

5ca9b7b

Add a output stride check

69d910d

emcastillo force-pushed the batch-reduction branch from 96c7a98 to 69d910d Compare June 13, 2020 07:06

emcastillo closed this Oct 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batched CPU reductions #39512

Batched CPU reductions #39512

Uh oh!

emcastillo commented Jun 4, 2020 •

edited

Loading

Uh oh!

emcastillo Jun 4, 2020

Uh oh!

dr-ci bot commented Jun 4, 2020 •

edited

Loading

Uh oh!

peterbell10 commented Jun 10, 2020

Uh oh!

emcastillo commented Jun 12, 2020

Uh oh!

emcastillo commented Jun 12, 2020

Uh oh!

emcastillo commented Jun 12, 2020

Uh oh!

Uh oh!

emcastillo Jun 13, 2020

Uh oh!

peterbell10 commented Jun 14, 2020

Uh oh!

emcastillo commented Jun 14, 2020

Uh oh!

ngimel commented Oct 21, 2020

Uh oh!

emcastillo commented Oct 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Batched CPU reductions #39512

Batched CPU reductions #39512

Uh oh!

Conversation

emcastillo commented Jun 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emcastillo Jun 4, 2020

Choose a reason for hiding this comment

Uh oh!

dr-ci bot commented Jun 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

peterbell10 commented Jun 10, 2020

Uh oh!

emcastillo commented Jun 12, 2020

Uh oh!

emcastillo commented Jun 12, 2020

Uh oh!

emcastillo commented Jun 12, 2020

Uh oh!

Uh oh!

emcastillo Jun 13, 2020

Choose a reason for hiding this comment

Uh oh!

peterbell10 commented Jun 14, 2020

Uh oh!

emcastillo commented Jun 14, 2020

Uh oh!

ngimel commented Oct 21, 2020

Uh oh!

emcastillo commented Oct 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

emcastillo commented Jun 4, 2020 •

edited

Loading

dr-ci bot commented Jun 4, 2020 •

edited

Loading