Skip to content

Conversation

@nairbv
Copy link
Collaborator

@nairbv nairbv commented Nov 13, 2018

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Nov 14, 2018

Does the fix affect the overload torch.arange(end)? The corresponding ATen code seems to call in the legacy function without the new check: https://github.com/pytorch/pytorch/blob/f472a5075c83b26a5f1177b214ce3f0d691197c3/aten/src/ATen/native/TensorFactories.cpp#L98

The same for all arange_out overloads

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 14, 2018

Does the fix affect the overload torch.arange(end)?

nope, that's still broken. I'l fix it that too and add tests for other cases.

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Nov 14, 2018

@nairbv I don't understand PyTorch's internals very well. Is _th_arange_out also handled by the (now fixed) THTensor_(arange)(THTensor *r_, accreal xmin, accreal xmax, accreal step)? (and there isn't a test case for this, so I am not sure)

@ezyang
Copy link
Contributor

ezyang commented Nov 14, 2018

Appears so:

Tensor & CPUFloatType::_th_arange_out(Tensor & result, Scalar start, Scalar end, Scalar step) const {
    // DeviceGuard omitted
    auto result_ = checked_tensor_unwrap(result,"result",0, false, Backend::CPU, ScalarType::Float);
    auto start_ = start.toDouble();
    auto end_ = end.toDouble();
    auto step_ = step.toDouble();
    THFloatTensor_arange(result_, start_, end_, step_);
    return result;
}

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 14, 2018

@nairbv I don't understand PyTorch's internals very well. Is _th_arange_out also handled by (the now fixed) THTensor_(arange)(THTensor *r_, accreal xmin, accreal xmax, accreal step)?

yes, I'm new to this as well but the easiest way to see it is if you build the code, it'll appear in the generated file CPUFloatType.cpp (generated by aten/src/ATen/gen.py).

All of the _th_arange* functions call THFLoatTensor_arange(result_, start_, end_, step_); where sometimes start_ defaults to 0 and sometimes step defaults to 1... where sometimes result_ is a parameter and sometimes result is created. macro expansions lead from the THTensor_(arange) changed here to the THFloatTEnsor_arange being called.

@vadimkantorov
Copy link
Contributor

Is CUDA version also fixed by the edits in THTensor_(arange)?

For before it was, so CUDA was affected too:

>>> a = torch.arange(0, float('inf'), device = 'cuda')
>>> a.shape
torch.Size([-9223372036854775808])

@vadimkantorov
Copy link
Contributor

If you move down the guard to TH (as it appears now), it probably should also be duplicated in THC:

void THCTensor_(arange)(THCState* state, THCTensor *r_, accreal xmin, accreal xmax, accreal step) {

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Nov 16, 2018

Can an overflow result in a positive value or even a zero? Probably not? (depends on how many times the overflow rolls - probably not more than 1?)

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 19, 2018

Can an overflow result in a positive value or even a zero? Probably not? (depends on how many times the overflow rolls - probably not more than 1?)

hm... it seems that the behavior we're seeing is in the cast from double to int64_t (or actually, to ptrdiff_t). From what I can tell the behavior for an out of range cast is undefined, but in practice it doesn't seem to be wrapping. You saw this -9223372036854775808 value in tests (the minimum long value, or max long + 1).

We could try a range check on the double first. That's tricky though because ptrdiff_t's max value isn't exactly representable in floating point, and could differ in 32 bit systems. I'm not sure if there is actually a system that would wrap it, but I'm also pretty new to C++.

I'm a bit skeptical of the floating point arithmetic in general of (max-min)/step, but I don't see a specific issue with it yet, and we might be getting beyond the scope of the original ticket. Maybe we should get this merged, and file another ticket if we find another specific failure case?

scalar_t i = 0;

THArgCheck(step > 0 || step < 0, 3, "step must be nonzero");
THArgCheck(std::isfinite((double)xmin) && std::isfinite((double)xmax)

This comment was marked as off-topic.

This comment was marked as off-topic.

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 20, 2018

One of the newly-added tests did end up failing in one environment with:

Nov 19 23:30:46 test_arange (main.TestTorch) ... /var/lib/jenkins/workspace/aten/src/TH/generic/THTensorMoreMath.cpp:645:22: runtime error: 3.40282e+38 is outside the range of representable values of type 'long'

so I guess we will need to explicitly check the ranges before casting.

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 21, 2018

@pytorchbot retest this please

1 similar comment
@nairbv
Copy link
Collaborator Author

nairbv commented Nov 26, 2018

@pytorchbot retest this please

@ezyang
Copy link
Contributor

ezyang commented Nov 27, 2018

I'm a bit skeptical of the floating point arithmetic in general of (max-min)/step, but I don't see a specific issue with it yet, and we might be getting beyond the scope of the original ticket. Maybe we should get this merged, and file another ticket if we find another specific failure case?

I don't think anyone is actually blocked by the lack of a check here. Let's take the time to do it right.

THArgCheck(step > 0 || step < 0, 3, "step must be nonzero");
THArgCheck(std::isfinite(static_cast<double>(xmin)) &&
std::isfinite(static_cast<double>(xmax))
, 1, "unsupported range: ", xmin, " -> ", xmax);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the bug report, and the PR comments, but I don't see why you need to cast the quantity to a double before checking if it's finite. I'm pretty sure std::isfinite has overloads for the integral types too. Is there something I'm missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...I thought I remembered getting an "ambiguous call to overloaded function" when I didn't cast, but, not seeing that now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah, looks like that 'ambiguous' error only occurs on windows. I'll need to re-add the cast:

15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(402): error C2668: 'fpclassify': ambiguous call to overloaded function
15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(299): note: could be 'int fpclassify(long double) throw()'
15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(294): note: or       'int fpclassify(double) throw()'
15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(289): note: or       'int fpclassify(float) throw()'
15:26:10 C:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt\corecrt_math.h(402): note: while trying to match the argument list '(int64_t)'
15:26:10 c:\jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-build\aten\src\th\generic/THTensorMoreMath.cpp(642): note: see reference to function template instantiation 'bool isfinite<int64_t>(_Ty) throw()' being compiled
15:26:10         with
15:26:10         [
15:26:10             _Ty=int64_t
15:26:10         ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so, that is a great thing to have as a comment here, because it is not obvious.

double size_d = ceil(static_cast<double>(xmax - xmin) / step);
THArgCheck(size_d >= 0 && size_d <= static_cast<double>(PTRDIFF_MAX)
, 1, "invalid size, possible overflow?");
size = static_cast<ptrdiff_t>(size_d);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the static_cast<double>(PTRDIFF_MAX) gives me hives. I guess it's OK to not fix it properly for now, but if you don't get a comment here, all of the discussion on the PR will be lost to the sands fo time. Please put a comment here!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is there a better way though? as long as we're converting a double to an integral type there needs to be a cast, and at least now we've properly range checked it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't think of a better way to fix it, which is why I did not ask you to fix it :)

Copy link
Contributor

@vadimkantorov vadimkantorov Dec 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okey dokey

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nairbv is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Nov 28, 2018
Summary: Pull Request resolved: pytorch/pytorch#13915

Differential Revision: D13222110

Pulled By: nairbv

fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b
@ezyang ezyang added the merged label Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants