Skip to content

Conversation

@ngimel
Copy link
Collaborator

@ngimel ngimel commented Jul 24, 2020

This uses cub for cum* operations, because, unlike thrust, cub is non-synchronizing.
Cub does not support more than 2**31 element tensors out of the box (in fact, due to cub bugs the cutoff point is even smaller)
so to support that I split the tensor into 2**30 element chunks, and modify the first value of the second and subsequent chunks to contain the cumsum result of the previous chunks. Since modification is done inplace on the source tensor, if something goes wrong and we error out before the source tensor is reverted back to its original state, source tensor will be corrupted, but in most cases errors will invalidate the full coda context.

@ngimel ngimel requested review from mruberry and zasdfgbnm July 24, 2020 21:33
constexpr int max_cub_size = std::numeric_limits<int>::max() / 2 + 1; // 2**30
for (int64_t i = 0; i < size; i += max_cub_size) {
int size_cub = std::min<int64_t>(size - i, max_cub_size);
Tensor first_elem; // need to save it for all iterations other than first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clever :)

// need to temporarily transform first element of the range we are
// operating on self might be multi-d, but we need to index a single
// element
auto self_view = at::_unsafe_view(self, -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why view self inside the loop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases we won't enter this loop, in some others we'll enter it once. Taking the view out of the loop will make us take it unconditionally, which we don't want.

@dr-ci
Copy link

dr-ci bot commented Jul 24, 2020

💊 CI failures summary and remediations

As of commit 3ad7c06 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

XLA failure

Job pytorch_xla_linux_bionic_py3_6_clang9_test is failing. Please create an issue with title prefixed by [PT_BREAK] in pytorch/xla and link to to this PR. If you have questions, please reach out to @ailzhang / @dlibenzi / @JackCaoG.


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 11 times.

x_cpu = x.cpu().float()
expected = fn(x_cpu)
actual = fn(x).cpu().float()
self.assertEqual(expected, actual.cpu().float())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertEqual(expected, actual.cpu().float())
self.assertEqual(expected, actual)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always forget whether assertEqual can handle different devices and dtypes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are already doing .cpu().float() in the line above, so no need to do it here again.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Natalia Gimelshein added 2 commits July 24, 2020 23:35
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in 6ca5421.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants