Skip to content

Conversation

@ngimel
Copy link
Collaborator

@ngimel ngimel commented May 24, 2019

Uses int64_t OffsetCalculator when necessary, fix for #20888
cc @colesbury.
@colesbury, should it be uint32_t here

int64_t max_value = std::numeric_limits<int32_t>::max();
? Default indexing type is uint32_t, not int32_t.

@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: operators labels May 24, 2019
@colesbury
Copy link
Member

should it be uint32_t here?

No, because the "32-bit" code path uses "fast integer division" (THCIntegerDivider.cuh) and N-bit division can require an N+1 bit magic number.

I slightly prefer my patch (#20919). The 32-bit + split is slightly faster than 64-bit indexing (23.5 ms vs. 33.3 ms for the repro you wrote for #20888) and has fewer specializations. None of this really matters because we're not really optimizing for the giant Tensor case and the extra template specializations are unlikely to have a significant impact on overall compilation time.

@ngimel
Copy link
Collaborator Author

ngimel commented May 24, 2019

Closed in favor of #20919

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: cuda Related to torch.cuda, and CUDA support in general open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants