-
Notifications
You must be signed in to change notification settings - Fork 26.3k
check for invalid ranges in torch.arange #13915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Does the fix affect the overload The same for all |
nope, that's still broken. I'l fix it that too and add tests for other cases. |
|
@nairbv I don't understand PyTorch's internals very well. Is |
|
Appears so: |
yes, I'm new to this as well but the easiest way to see it is if you build the code, it'll appear in the generated file CPUFloatType.cpp (generated by aten/src/ATen/gen.py). All of the |
|
Is CUDA version also fixed by the edits in For before it was, so CUDA was affected too: |
|
If you move down the guard to TH (as it appears now), it probably should also be duplicated in THC: pytorch/aten/src/THC/generic/THCTensorMath.cu Line 461 in 8682999
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Can an overflow result in a positive value or even a zero? Probably not? (depends on how many times the overflow rolls - probably not more than 1?) |
hm... it seems that the behavior we're seeing is in the cast from double to int64_t (or actually, to ptrdiff_t). From what I can tell the behavior for an out of range cast is undefined, but in practice it doesn't seem to be wrapping. You saw this -9223372036854775808 value in tests (the minimum long value, or max long + 1). We could try a range check on the double first. That's tricky though because ptrdiff_t's max value isn't exactly representable in floating point, and could differ in 32 bit systems. I'm not sure if there is actually a system that would wrap it, but I'm also pretty new to C++. I'm a bit skeptical of the floating point arithmetic in general of |
added parameter checks to cuda version of tensor, used standard THArgCheck instead of AT_CHECK, updated tests to verify use on both device types.
| scalar_t i = 0; | ||
|
|
||
| THArgCheck(step > 0 || step < 0, 3, "step must be nonzero"); | ||
| THArgCheck(std::isfinite((double)xmin) && std::isfinite((double)xmax) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
One of the newly-added tests did end up failing in one environment with: Nov 19 23:30:46 test_arange (main.TestTorch) ... /var/lib/jenkins/workspace/aten/src/TH/generic/THTensorMoreMath.cpp:645:22: runtime error: 3.40282e+38 is outside the range of representable values of type 'long' so I guess we will need to explicitly check the ranges before casting. |
|
@pytorchbot retest this please |
1 similar comment
|
@pytorchbot retest this please |
I don't think anyone is actually blocked by the lack of a check here. Let's take the time to do it right. |
| THArgCheck(step > 0 || step < 0, 3, "step must be nonzero"); | ||
| THArgCheck(std::isfinite(static_cast<double>(xmin)) && | ||
| std::isfinite(static_cast<double>(xmax)) | ||
| , 1, "unsupported range: ", xmin, " -> ", xmax); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the bug report, and the PR comments, but I don't see why you need to cast the quantity to a double before checking if it's finite. I'm pretty sure std::isfinite has overloads for the integral types too. Is there something I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm...I thought I remembered getting an "ambiguous call to overloaded function" when I didn't cast, but, not seeing that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yeah, looks like that 'ambiguous' error only occurs on windows. I'll need to re-add the cast:
15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(402): error C2668: 'fpclassify': ambiguous call to overloaded function
15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(299): note: could be 'int fpclassify(long double) throw()'
15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(294): note: or 'int fpclassify(double) throw()'
15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(289): note: or 'int fpclassify(float) throw()'
15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(402): note: while trying to match the argument list '(int64_t)'
15:26:10 c:\jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-build\aten\src\th\generic/THTensorMoreMath.cpp(642): note: see reference to function template instantiation 'bool isfinite<int64_t>(_Ty) throw()' being compiled
15:26:10 with
15:26:10 [
15:26:10 _Ty=int64_t
15:26:10 ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so, that is a great thing to have as a comment here, because it is not obvious.
| double size_d = ceil(static_cast<double>(xmax - xmin) / step); | ||
| THArgCheck(size_d >= 0 && size_d <= static_cast<double>(PTRDIFF_MAX) | ||
| , 1, "invalid size, possible overflow?"); | ||
| size = static_cast<ptrdiff_t>(size_d); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the static_cast<double>(PTRDIFF_MAX) gives me hives. I guess it's OK to not fix it properly for now, but if you don't get a comment here, all of the discussion on the PR will be lost to the sands fo time. Please put a comment here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang is there a better way though? as long as we're converting a double to an integral type there needs to be a cast, and at least now we've properly range checked it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't think of a better way to fix it, which is why I did not ask you to fix it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this seems to be the boost way: https://www.boost.org/doc/libs/1_57_0/libs/numeric/conversion/doc/html/boost_numericconversion/improved_numeric_cast__.html (maybe its source code has a relatively self-contained function)
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okey dokey
This reverts commit eba160c.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nairbv is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Pull Request resolved: pytorch/pytorch#13915 Differential Revision: D13222110 Pulled By: nairbv fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b
#13782