check for invalid ranges in torch.arange #13915

nairbv · 2018-11-13T19:25:46Z

aten/src/ATen/native/TensorFactories.cpp

test/test_torch.py

aten/src/ATen/native/TensorFactories.cpp

vadimkantorov · 2018-11-14T00:33:01Z

Does the fix affect the overload torch.arange(end)? The corresponding ATen code seems to call in the legacy function without the new check: https://github.com/pytorch/pytorch/blob/f472a5075c83b26a5f1177b214ce3f0d691197c3/aten/src/ATen/native/TensorFactories.cpp#L98

The same for all arange_out overloads

nairbv · 2018-11-14T15:27:57Z

Does the fix affect the overload torch.arange(end)?

nope, that's still broken. I'l fix it that too and add tests for other cases.

vadimkantorov · 2018-11-14T16:27:37Z

@nairbv I don't understand PyTorch's internals very well. Is _th_arange_out also handled by the (now fixed) THTensor_(arange)(THTensor *r_, accreal xmin, accreal xmax, accreal step)? (and there isn't a test case for this, so I am not sure)

ezyang · 2018-11-14T16:30:32Z

Appears so:

Tensor & CPUFloatType::_th_arange_out(Tensor & result, Scalar start, Scalar end, Scalar step) const {
    // DeviceGuard omitted
    auto result_ = checked_tensor_unwrap(result,"result",0, false, Backend::CPU, ScalarType::Float);
    auto start_ = start.toDouble();
    auto end_ = end.toDouble();
    auto step_ = step.toDouble();
    THFloatTensor_arange(result_, start_, end_, step_);
    return result;
}

nairbv · 2018-11-14T16:33:03Z

@nairbv I don't understand PyTorch's internals very well. Is _th_arange_out also handled by (the now fixed) THTensor_(arange)(THTensor *r_, accreal xmin, accreal xmax, accreal step)?

yes, I'm new to this as well but the easiest way to see it is if you build the code, it'll appear in the generated file CPUFloatType.cpp (generated by aten/src/ATen/gen.py).

All of the _th_arange* functions call THFLoatTensor_arange(result_, start_, end_, step_); where sometimes start_ defaults to 0 and sometimes step defaults to 1... where sometimes result_ is a parameter and sometimes result is created. macro expansions lead from the THTensor_(arange) changed here to the THFloatTEnsor_arange being called.

vadimkantorov · 2018-11-14T16:57:46Z

Is CUDA version also fixed by the edits in THTensor_(arange)?

For before it was, so CUDA was affected too:

>>> a = torch.arange(0, float('inf'), device = 'cuda')
>>> a.shape
torch.Size([-9223372036854775808])

vadimkantorov · 2018-11-14T17:01:57Z

If you move down the guard to TH (as it appears now), it probably should also be duplicated in THC:

pytorch/aten/src/THC/generic/THCTensorMath.cu

Line 461 in 8682999

    
           void THCTensor_(arange)(THCState* state, THCTensor *r_, accreal xmin, accreal xmax, accreal step) {

aten/src/THC/generic/THCTensorMath.cu

vadimkantorov · 2018-11-16T22:56:45Z

Can an overflow result in a positive value or even a zero? Probably not? (depends on how many times the overflow rolls - probably not more than 1?)

nairbv · 2018-11-19T14:52:08Z

Can an overflow result in a positive value or even a zero? Probably not? (depends on how many times the overflow rolls - probably not more than 1?)

hm... it seems that the behavior we're seeing is in the cast from double to int64_t (or actually, to ptrdiff_t). From what I can tell the behavior for an out of range cast is undefined, but in practice it doesn't seem to be wrapping. You saw this -9223372036854775808 value in tests (the minimum long value, or max long + 1).

We could try a range check on the double first. That's tricky though because ptrdiff_t's max value isn't exactly representable in floating point, and could differ in 32 bit systems. I'm not sure if there is actually a system that would wrap it, but I'm also pretty new to C++.

I'm a bit skeptical of the floating point arithmetic in general of (max-min)/step, but I don't see a specific issue with it yet, and we might be getting beyond the scope of the original ticket. Maybe we should get this merged, and file another ticket if we find another specific failure case?

added parameter checks to cuda version of tensor, used standard THArgCheck instead of AT_CHECK, updated tests to verify use on both device types.

aten/src/TH/generic/THTensorMoreMath.cpp

  scalar_t i = 0;

  THArgCheck(step > 0 || step < 0, 3, "step must be nonzero");
+  THArgCheck(std::isfinite((double)xmin) && std::isfinite((double)xmax)


nairbv · 2018-11-20T22:22:26Z

One of the newly-added tests did end up failing in one environment with:

Nov 19 23:30:46 test_arange (main.TestTorch) ... /var/lib/jenkins/workspace/aten/src/TH/generic/THTensorMoreMath.cpp:645:22: runtime error: 3.40282e+38 is outside the range of representable values of type 'long'

so I guess we will need to explicitly check the ranges before casting.

nairbv · 2018-11-21T18:32:22Z

@pytorchbot retest this please

nairbv · 2018-11-26T15:11:18Z

@pytorchbot retest this please

ezyang · 2018-11-27T02:52:50Z

I'm a bit skeptical of the floating point arithmetic in general of (max-min)/step, but I don't see a specific issue with it yet, and we might be getting beyond the scope of the original ticket. Maybe we should get this merged, and file another ticket if we find another specific failure case?

I don't think anyone is actually blocked by the lack of a check here. Let's take the time to do it right.

ezyang · 2018-11-27T02:54:52Z

aten/src/TH/generic/THTensorMoreMath.cpp

  THArgCheck(step > 0 || step < 0, 3, "step must be nonzero");
+  THArgCheck(std::isfinite(static_cast<double>(xmin)) &&
+              std::isfinite(static_cast<double>(xmax))
+              , 1, "unsupported range: ", xmin, " -> ", xmax);


I checked the bug report, and the PR comments, but I don't see why you need to cast the quantity to a double before checking if it's finite. I'm pretty sure std::isfinite has overloads for the integral types too. Is there something I'm missing?

hmm...I thought I remembered getting an "ambiguous call to overloaded function" when I didn't cast, but, not seeing that now.

ah, yeah, looks like that 'ambiguous' error only occurs on windows. I'll need to re-add the cast:

15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(402): error C2668: 'fpclassify': ambiguous call to overloaded function 15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(299): note: could be 'int fpclassify(long double) throw()' 15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(294): note: or 'int fpclassify(double) throw()' 15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(289): note: or 'int fpclassify(float) throw()' 15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(402): note: while trying to match the argument list '(int64_t)' 15:26:10 c:\jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-build\aten\src\th\generic/THTensorMoreMath.cpp(642): note: see reference to function template instantiation 'bool isfinite<int64_t>(_Ty) throw()' being compiled 15:26:10 with 15:26:10 [ 15:26:10 _Ty=int64_t 15:26:10 ]

Yeah, so, that is a great thing to have as a comment here, because it is not obvious.

ezyang · 2018-11-27T03:11:41Z

aten/src/TH/generic/THTensorMoreMath.cpp

+  double size_d = ceil(static_cast<double>(xmax - xmin) / step);
+  THArgCheck(size_d >= 0 && size_d <= static_cast<double>(PTRDIFF_MAX)
+             , 1, "invalid size, possible overflow?");
+  size = static_cast<ptrdiff_t>(size_d);


Yeah, the static_cast<double>(PTRDIFF_MAX) gives me hives. I guess it's OK to not fix it properly for now, but if you don't get a comment here, all of the discussion on the PR will be lost to the sands fo time. Please put a comment here!

@ezyang is there a better way though? as long as we're converting a double to an integral type there needs to be a cast, and at least now we've properly range checked it.

I couldn't think of a better way to fix it, which is why I did not ask you to fix it :)

FWIW this seems to be the boost way: https://www.boost.org/doc/libs/1_57_0/libs/numeric/conversion/doc/html/boost_numericconversion/improved_numeric_cast__.html (maybe its source code has a relatively self-contained function)

a SO answer: https://stackoverflow.com/questions/49658182/does-c-have-an-equivalent-boostnumeric-castdesttypesourcetype/49658950#49658950

ezyang

okey dokey

This reverts commit eba160c.

facebook-github-bot

@nairbv is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Pull Request resolved: pytorch/pytorch#13915 Differential Revision: D13222110 Pulled By: nairbv fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b

soumith reviewed Nov 13, 2018

View reviewed changes

aten/src/ATen/native/TensorFactories.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

zou3519 reviewed Nov 13, 2018

View reviewed changes

test/test_torch.py Outdated Show resolved Hide resolved

zou3519 reviewed Nov 13, 2018

View reviewed changes

aten/src/ATen/native/TensorFactories.cpp Outdated Show resolved Hide resolved

vadimkantorov reviewed Nov 14, 2018

View reviewed changes

aten/src/THC/generic/THCTensorMath.cu Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

nairbv added 5 commits November 19, 2018 09:51

check for invalid ranges in torch.arange

d0acb08

torch.arange range check updates from review pytorch#13782

8c0d10c

move range check to THTensorMoreMath so applicable to all arange calls

1999e2d

improve parameter checking for torch.arange

2d3abf8

added parameter checks to cuda version of tensor, used standard THArgCheck instead of AT_CHECK, updated tests to verify use on both device types.

handle overflow of numel to avoid negative size

4a4cc13

nairbv force-pushed the check_inf branch from fc7d99c to 4a4cc13 Compare November 19, 2018 17:53

fix "ambiguous call to overloaded function" in windows CI build

c2f1811

ezyang reviewed Nov 19, 2018

View reviewed changes

nairbv added 2 commits November 20, 2018 15:12

explicitly check range of ptrdiff_t in arange

66a0447

__PTRDIFF_MAX__ not available on some systems, using PTRDIFF_MAX instead

aaf6e36

ezyang reviewed Nov 27, 2018

View reviewed changes

ezyang approved these changes Nov 27, 2018

View reviewed changes

nairbv added 2 commits November 27, 2018 07:15

remove unnecessary cast

eba160c

Revert "remove unnecessary cast"

b2e639a

This reverts commit eba160c.

facebook-github-bot reviewed Nov 27, 2018

View reviewed changes

facebook-github-bot closed this in a0def0b Nov 28, 2018

vadimkantorov mentioned this pull request Dec 21, 2018

Port torch.range to aten and parallelize on CPU. #15484

Closed

ezyang added the merged label Jun 25, 2019

imaginary-person mentioned this pull request Jun 30, 2021

add BFloat16 operators on CPU: arange, acosh, asinh, atanh, exp2, digamma, trigamma, polygamma #60444

Closed

check for invalid ranges in torch.arange #13915

check for invalid ranges in torch.arange #13915

Uh oh!

Conversation

nairbv commented Nov 13, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

Uh oh!

vadimkantorov commented Nov 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nairbv commented Nov 14, 2018

Uh oh!

vadimkantorov commented Nov 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Nov 14, 2018

Uh oh!

nairbv commented Nov 14, 2018

Uh oh!

vadimkantorov commented Nov 14, 2018

Uh oh!

vadimkantorov commented Nov 14, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

vadimkantorov commented Nov 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nairbv commented Nov 19, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

nairbv commented Nov 20, 2018

Uh oh!

nairbv commented Nov 21, 2018

Uh oh!

nairbv commented Nov 26, 2018

Uh oh!

ezyang commented Nov 27, 2018

Uh oh!

ezyang Nov 27, 2018

Choose a reason for hiding this comment

Uh oh!

nairbv Nov 27, 2018

Choose a reason for hiding this comment

Uh oh!

nairbv Nov 27, 2018

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 28, 2018

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 27, 2018

Choose a reason for hiding this comment

Uh oh!

nairbv Nov 27, 2018

Choose a reason for hiding this comment

Uh oh!

ezyang Dec 3, 2018

Choose a reason for hiding this comment

Uh oh!

vadimkantorov Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

vadimkantorov commented Nov 14, 2018 •

edited

Loading

vadimkantorov commented Nov 14, 2018 •

edited

Loading

vadimkantorov commented Nov 16, 2018 •

edited

Loading

vadimkantorov Dec 3, 2018 •

edited

Loading