-
Notifications
You must be signed in to change notification settings - Fork 26.3k
implement NumPy-like functionality maximum, minimum #42579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement NumPy-like functionality maximum, minimum #42579
Conversation
💊 CI failures summary and remediationsAs of commit 965b236 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
7f07a5d to
4c0cecb
Compare
|
@mruberry This PR implements a pair of NumPy functions |
4c0cecb to
203456b
Compare
203456b to
1da9391
Compare
aten/src/ATen/native/BinaryOps.cpp
Outdated
| } | ||
|
|
||
| Tensor maximum(const Tensor& self, const Tensor& other) { | ||
| Tensor result = at::empty(0, self.options()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensor result;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, I have changed the code to Tensor result and used iter.output() for the returning value.
aten/src/ATen/native/BinaryOps.cpp
Outdated
| } | ||
|
|
||
| Tensor minimum(const Tensor& self, const Tensor& other) { | ||
| Tensor result = at::empty(0, self.options()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensor result;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
| } else { | ||
| AT_DISPATCH_COMPLEX_TYPES(iter.dtype(), "maximum_cpu", [&] { | ||
| cpu_kernel(iter, [](scalar_t a, scalar_t b) -> scalar_t { | ||
| if (_isnan(a.real()) || _isnan(a.imag())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isnan works on complex, see:
| Tensor isnan(const Tensor& self) { |
It's documented as doing so:
https://pytorch.org/docs/master/generated/torch.isnan.html?highlight=isnan#torch.isnan
This should save you from looking at the real and imaginary parts separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However... (see above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation of isnan function.
As complex numbers are not supported, the complex type dispatch codes are removed in the latest commit.
| }); | ||
| }); | ||
| } else { | ||
| AT_DISPATCH_COMPLEX_TYPES(iter.dtype(), "maximum_cpu", [&] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparisons between complex numbers aren't well-defined mathematically and we prohibit them elsewhere (NumPy is updating their functionality to conform to ours). I would think that means that we shouldn't support maximum and minimum for complex tensors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the kind suggestions, I have added checks to prevent complex inputs and deleted redundant complex dispatch codes.
| } | ||
| } | ||
|
|
||
| void minimum_kernel(TensorIterator& iter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See symmetric comments above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
|
|
||
| namespace at { namespace native { | ||
|
|
||
| void maximum_kernel_cuda(TensorIterator& iter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See symmetric comments above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
| AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.input_dtype(), "maximum_cuda", [&]() { | ||
| gpu_kernel(iter, [] GPU_LAMBDA (scalar_t a, scalar_t b) -> scalar_t { | ||
| // isnan(half) breaks the Windows build. We explicitly cast half to float. | ||
| using acc_type = typename AccumulateType<scalar_t, /*is_cuda=*/true>::type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use the accumulate type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring the following code from MaxMinElementwiseKernel.cu to prevent isnan breaks for half types on the Windows platform.
pytorch/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu
Lines 26 to 32 in e8f4b04
| gpu_kernel(iter, []GPU_LAMBDA(scalar_t a, scalar_t b) -> scalar_t { | |
| // isnan(half) breaks the Windows build. We explicitly cast half to float. | |
| using acc_type = typename AccumulateType<scalar_t, /*is_cuda=*/true>::type; | |
| // We avoid using nan or nanf because we want to return the same type as scalar_t. | |
| if (::isnan(static_cast<acc_type>(a))) { | |
| return a; | |
| } else if (::isnan(static_cast<acc_type>(b))) { |
And thanks to the kind suggestions on using a != a directly, the accumulate type is no longer needed.
| } else if (isFloatingType(iter.dtype())) { | ||
| AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.input_dtype(), "minimum_cuda", [&]() { | ||
| gpu_kernel(iter, [] GPU_LAMBDA (scalar_t a, scalar_t b) -> scalar_t { | ||
| // isnan(half) breaks the Windows build. We explicitly cast half to float. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would (a != a) work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it works and I have used a != a directly for checking nans.
| variants: method, function | ||
|
|
||
| - func: maximum.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) | ||
| dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to be explicit about this dispatch. I believe it will automatically dispatch to maximum_out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out, I have removed the redundant dispatch sections.
| variants: method, function | ||
|
|
||
| - func: minimum.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) | ||
| dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note on dispatch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed.
|
|
||
| @unittest.skipIf(not TEST_NUMPY, "Numpy not found") | ||
| @dtypes(*(torch.testing.get_all_int_dtypes() + [torch.bool])) | ||
| def test_maximum_minimum_int_and_bool(self, device, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is already very good. You could improve it by generating the NumPy arrays using https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html?highlight=integers#numpy.random.Generator.integers instead of handwriting a tuple of values. The arrays can then be converted to torch tensors with torch.from_numpy(a).to(device=device).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the kind suggestions, I have updated test_maximum_minimum_int_and_bool to let NumPy generate random inputs for me.
test/test_torch.py
Outdated
| b_np = np.array(b_vals, dtype=torch_to_numpy_dtype_dict[dtype]) | ||
| numpy_result = numpy_op(a_np, b_np) | ||
|
|
||
| self.assertEqual(tensor_result.cpu(), torch.from_numpy(numpy_result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can compare directly to the NumPy array without converting it to a torch.Tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
test/test_torch.py
Outdated
| # np.maximum and np.minimum functions compare input arrays element-wisely. | ||
| # if one of the elements being compared is a NaN, then that element is returned. | ||
| ops = ((torch.maximum, np.maximum), (torch.minimum, np.minimum)) | ||
| a_vals = (float('inf'), -float('inf'), 2.0, float('nan'), -1.0, float('nan')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also simplify this test using np.randn to generate a float tensor, if you like. You would still have to generate an "extremal values" test case with -inf, inf, and nan by hand, though. You also can simplify by just comparing the torch and numpy results. You don't need to assert the results are correct.
To handle bfloat16 you can just cast the generated arrays to the desired torch type (leaving the arrays in double) and then compare using self.assertEqual(actual, expected, exact_dtype=False).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions, I have split the test cases into two methods
- test_maximum_minimum_float: use
np.randnto generate random float inputs. - test_maximum_minimum_float_nan_and_inf: test external values like
nan,-infandinf
For bfloat16, I have to set rtol and atol like below to prevent precision issues
self.assertEqual(tensor_result, numpy_result, rtol=0.016, atol=1e-5, exact_dtype=False)
The values of rtol and atol refers to common_utils.py file.
pytorch/torch/testing/_internal/common_utils.py
Lines 939 to 942 in 42b4a71
| # dtype name : (rtol, atol) | |
| dtype_precisions = { | |
| torch.float16 : (0.001, 1e-5), | |
| torch.bfloat16 : (0.016, 1e-5), |
test/test_torch.py
Outdated
|
|
||
| @unittest.skipIf(not TEST_NUMPY, "Numpy not found") | ||
| @dtypes(torch.complex64, torch.complex128) | ||
| def test_maximum_minimum_complex(self, device, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comments: should we support complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test cases for complex numbers have been updated.
The current test cases are to whether the error messages are raised when either input is a complex number.
torch/_torch_docs.py
Outdated
| .. note:: | ||
| If one of the elements being compared is a NaN, then that element is returned. | ||
| If both elements are NaNs then the first is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the note may be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The If both elements are NaNs then the first is returned. is removed and the whole note is changed like minimum function.
Please kindly confirm.
mruberry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice teamwork!
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Just waiting on resolution of pytorch/xla#2424 now. |
|
Hi @RockingJavaBean , is it possible to rebase this branch with the pytorch master? I submitted pytorch/xla#2438 to resolve the xla issue and pin to this pr as the pytorch version. However, #43047 is missing from this branch and the xla_build will fail. I patched this pr and tested my change locally and everything seems to be fine, but it is still better to let circleCi to do its job. |
…y_maximum_minimum
|
@RockingJavaBean Thanks! I saw all test passed on XLA:CPU but some failed on XLA:GPU, this is most likely due to a different handling of nan in XLA:GPU. I will check those tomorrow. |
|
Thanks for helping out, @JackCaoG! |
|
@mruberry The |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Great work all! Especially @ailzhang and @JackCaoG for taking the time to help us out with PyTorch/XLA. @RockingJavaBean, let me know if you'd like some suggestions for other PyTorch issues or features. |
|
@mruberry I’m so grateful for your kind help and patient guidance throughout this PR. |
Happy to help you find another project, @RockingJavaBean. Would you like something simpler or more challenging, another function like maximum and minimum or something different, like a function that would use existing PyTorch functions to manipulate tensors, or a function with more challenging numeric and/or performance issues? If you could suggest what you're interested in it'll help me think of a fun project. |
|
@mruberry thank you so much for the advice, I would like to try more challenging issues. |
OK, let me suggest a project and think about whether this would be interesting. In PyTorch we have torch.linalg.outer, which is an alias for torch.ger, a function which computes the outer product of two vectors. The CPU implementation for this is very old. So old, in fact, that it's in the TH tensor library (see pytorch/aten/src/TH/generic/THBlas.cpp Line 114 in f06d390
There are several challenges with this function:
These could even be done in 2-3 PRs (handling complex dtypes should definitely be its own PR, the other issues could be combined). This would require benchmarking the current BLAS-based solution with a new alternative (like reusing matmul) on various sizes and dtypes, as well as understanding the current code while writing a new, possibly faster implementation. I think this is a really interesting project, but if it's not what you had in mind let me know and I can suggest something else. The goal here would be to demonstrate a faster implementation for PyTorch's CPU outer product and modernize the code. Follow-up work could include handling complex vectors or reviewing the CUDA outer product implementation. |
|
@mruberry thanks so much for the detailed suggestions, it sounds really interesting to me. |
Sounds good. Please let me know if you need more pointers or additional help. |
Issue revealed by #2371, which aten.max.other is lack of matching overload. It's caused by missing type promotion. The reason is that aten::max.other (binary max) is an alias of aten::maimum.default. Thus, iwhen type promotion pass dispatches torch.max through `__torch__dispatch__`, it does not find aten::max.other (However, I am not sure how `make_fx` dispatches torch.max to aten::max.other). The existence of aten::max.other looks like a legacy code: pytorch/pytorch#42579.
Related to #38349
Implement NumPy-like functions
maximumandminimum.The
maximumandminimumfunctions compute input tensors element-wise, returning a new array with the element-wise maxima/minima.If one of the elements being compared is a NaN, then that element is returned, both
maximumandminimumfunctions do not support complex inputs.This PR also promotes the overloaded versions of torch.max and torch.min, by re-dispatching binary
torch.maxandtorch.mintotorch.maximumandtorch.minimum.