Skip to content

Conversation

@RockingJavaBean
Copy link
Contributor

@RockingJavaBean RockingJavaBean commented Aug 5, 2020

Related to #38349

Implement NumPy-like functions maximum and minimum.
The maximum and minimum functions compute input tensors element-wise, returning a new array with the element-wise maxima/minima.

If one of the elements being compared is a NaN, then that element is returned, both maximum and minimum functions do not support complex inputs.

This PR also promotes the overloaded versions of torch.max and torch.min, by re-dispatching binary torch.max and torch.min to torch.maximum and torch.minimum.

@dr-ci
Copy link

dr-ci bot commented Aug 5, 2020

💊 CI failures summary and remediations

As of commit 965b236 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Aug 24 14:25:20 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/workspace/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: use of undeclared identifier \'strtod_l\'\n return ((int*)(&strtod_l))[argc];\n ^\n1 error generated.\n" }
Aug 24 14:25:20     assert check_overrides(overrides, overridden) 
Aug 24 14:25:20 AssertionError 
Aug 24 14:25:20 Failed to generate ATEN bindings: ['/var/lib/jenkins/workspace/xla/scripts/generate_code.sh'] 
Aug 24 14:25:20 Building torch_xla version: 1.6 
Aug 24 14:25:20 XLA Commit ID: 47856c8c8a2d4c58836752b651909f894189afe3 
Aug 24 14:25:20 PyTorch Commit ID: 2fe33f987a49442b5516b49badc8d14fc7000bc4 
Aug 24 14:25:20 =================== sccache compilation log =================== 
Aug 24 14:25:20 + cleanup 
Aug 24 14:25:20 + retcode=1 
Aug 24 14:25:20 + set +x 
Aug 24 14:25:20 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/workspace/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: use of undeclared identifier \'strtod_l\'\n  return ((int*)(&strtod_l))[argc];\n                  ^\n1 error generated.\n" } 
Aug 24 14:25:20  
Aug 24 14:25:21 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Aug 24 14:25:21 Compile requests               6095 
Aug 24 14:25:21 Compile requests executed      3606 
Aug 24 14:25:21 Cache hits                     2218 
Aug 24 14:25:21 Cache misses                   1372 
Aug 24 14:25:21 Cache timeouts                    0 
Aug 24 14:25:21 Cache read errors                 0 
Aug 24 14:25:21 Forced recaches                   0 
Aug 24 14:25:21 Cache write errors                0 

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_bazel_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun) ❄️

Aug 24 14:03:36 TIMEOUT: //:integration_test (Summary)
Aug 24 14:02:37 [       OK ] ModulesTest.PrettyPrintGRUCell (2 ms) 
Aug 24 14:02:37 [ RUN      ] ModulesTest.PrettyPrintAdaptiveLogSoftmaxWithLoss 
Aug 24 14:02:37 [       OK ] ModulesTest.PrettyPrintAdaptiveLogSoftmaxWithLoss (1 ms) 
Aug 24 14:02:37 [----------] 241 tests from ModulesTest (226220 ms total) 
Aug 24 14:02:37  
Aug 24 14:02:37 [----------] Global test environment tear-down 
Aug 24 14:02:37 [==========] 241 tests from 1 test suite ran. (226221 ms total) 
Aug 24 14:02:37 [  PASSED  ] 241 tests. 
Aug 24 14:02:37 ================================================================================ 
Aug 24 14:03:36  
Aug 24 14:03:36 TIMEOUT: //:integration_test (Summary) 
Aug 24 14:03:36       /var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/execroot/pytorch/bazel-out/k8-fastbuild/testlogs/integration_test/test.log 
Aug 24 14:03:36 INFO: From Testing //:integration_test: 
Aug 24 14:03:36 ==================== Test output for //:integration_test: 
Aug 24 14:03:36 Running main() from gmock_main.cc 
Aug 24 14:03:36 Note: Google Test filter = -*CUDA 
Aug 24 14:03:36 [==========] Running 1 test from 1 test suite. 
Aug 24 14:03:36 [----------] Global test environment set-up. 
Aug 24 14:03:36 [----------] 1 test from IntegrationTest 
Aug 24 14:03:36 [ RUN      ] IntegrationTest.CartPole 
Aug 24 14:03:36 -- Test timed out at 2020-08-24 14:03:21 UTC -- 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 58 times.

@RockingJavaBean RockingJavaBean changed the title [WIP] implement NumPy-like functionality maximum, minimum implement NumPy-like functionality maximum, minimum Aug 5, 2020
@RockingJavaBean
Copy link
Contributor Author

@mruberry This PR implements a pair of NumPy functions maximum and minimum, please kindly help review.

@mruberry mruberry self-requested a review August 6, 2020 22:43
@mruberry mruberry added the module: numpy Related to numpy support, and also numpy compatibility of our operators label Aug 6, 2020
}

Tensor maximum(const Tensor& self, const Tensor& other) {
Tensor result = at::empty(0, self.options());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensor result;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I have changed the code to Tensor result and used iter.output() for the returning value.

}

Tensor minimum(const Tensor& self, const Tensor& other) {
Tensor result = at::empty(0, self.options());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensor result;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

} else {
AT_DISPATCH_COMPLEX_TYPES(iter.dtype(), "maximum_cpu", [&] {
cpu_kernel(iter, [](scalar_t a, scalar_t b) -> scalar_t {
if (_isnan(a.real()) || _isnan(a.imag())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think isnan works on complex, see:

Tensor isnan(const Tensor& self) {

It's documented as doing so:

https://pytorch.org/docs/master/generated/torch.isnan.html?highlight=isnan#torch.isnan

This should save you from looking at the real and imaginary parts separately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However... (see above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation of isnan function.
As complex numbers are not supported, the complex type dispatch codes are removed in the latest commit.

});
});
} else {
AT_DISPATCH_COMPLEX_TYPES(iter.dtype(), "maximum_cpu", [&] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparisons between complex numbers aren't well-defined mathematically and we prohibit them elsewhere (NumPy is updating their functionality to conform to ours). I would think that means that we shouldn't support maximum and minimum for complex tensors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the kind suggestions, I have added checks to prevent complex inputs and deleted redundant complex dispatch codes.

}
}

void minimum_kernel(TensorIterator& iter) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See symmetric comments above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed


namespace at { namespace native {

void maximum_kernel_cuda(TensorIterator& iter) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See symmetric comments above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.input_dtype(), "maximum_cuda", [&]() {
gpu_kernel(iter, [] GPU_LAMBDA (scalar_t a, scalar_t b) -> scalar_t {
// isnan(half) breaks the Windows build. We explicitly cast half to float.
using acc_type = typename AccumulateType<scalar_t, /*is_cuda=*/true>::type;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the accumulate type here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring the following code from MaxMinElementwiseKernel.cu to prevent isnan breaks for half types on the Windows platform.

gpu_kernel(iter, []GPU_LAMBDA(scalar_t a, scalar_t b) -> scalar_t {
// isnan(half) breaks the Windows build. We explicitly cast half to float.
using acc_type = typename AccumulateType<scalar_t, /*is_cuda=*/true>::type;
// We avoid using nan or nanf because we want to return the same type as scalar_t.
if (::isnan(static_cast<acc_type>(a))) {
return a;
} else if (::isnan(static_cast<acc_type>(b))) {

And thanks to the kind suggestions on using a != a directly, the accumulate type is no longer needed.

} else if (isFloatingType(iter.dtype())) {
AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.input_dtype(), "minimum_cuda", [&]() {
gpu_kernel(iter, [] GPU_LAMBDA (scalar_t a, scalar_t b) -> scalar_t {
// isnan(half) breaks the Windows build. We explicitly cast half to float.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would (a != a) work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works and I have used a != a directly for checking nans.

variants: method, function

- func: maximum.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
dispatch:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to be explicit about this dispatch. I believe it will automatically dispatch to maximum_out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out, I have removed the redundant dispatch sections.

variants: method, function

- func: minimum.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
dispatch:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note on dispatch here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed.


@unittest.skipIf(not TEST_NUMPY, "Numpy not found")
@dtypes(*(torch.testing.get_all_int_dtypes() + [torch.bool]))
def test_maximum_minimum_int_and_bool(self, device, dtype):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is already very good. You could improve it by generating the NumPy arrays using https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html?highlight=integers#numpy.random.Generator.integers instead of handwriting a tuple of values. The arrays can then be converted to torch tensors with torch.from_numpy(a).to(device=device).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the kind suggestions, I have updated test_maximum_minimum_int_and_bool to let NumPy generate random inputs for me.

b_np = np.array(b_vals, dtype=torch_to_numpy_dtype_dict[dtype])
numpy_result = numpy_op(a_np, b_np)

self.assertEqual(tensor_result.cpu(), torch.from_numpy(numpy_result))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can compare directly to the NumPy array without converting it to a torch.Tensor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

# np.maximum and np.minimum functions compare input arrays element-wisely.
# if one of the elements being compared is a NaN, then that element is returned.
ops = ((torch.maximum, np.maximum), (torch.minimum, np.minimum))
a_vals = (float('inf'), -float('inf'), 2.0, float('nan'), -1.0, float('nan'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also simplify this test using np.randn to generate a float tensor, if you like. You would still have to generate an "extremal values" test case with -inf, inf, and nan by hand, though. You also can simplify by just comparing the torch and numpy results. You don't need to assert the results are correct.

To handle bfloat16 you can just cast the generated arrays to the desired torch type (leaving the arrays in double) and then compare using self.assertEqual(actual, expected, exact_dtype=False).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions, I have split the test cases into two methods

  • test_maximum_minimum_float: use np.randn to generate random float inputs.
  • test_maximum_minimum_float_nan_and_inf: test external values like nan, -inf and inf

For bfloat16, I have to set rtol and atol like below to prevent precision issues

self.assertEqual(tensor_result, numpy_result, rtol=0.016, atol=1e-5, exact_dtype=False)

The values of rtol and atol refers to common_utils.py file.

# dtype name : (rtol, atol)
dtype_precisions = {
torch.float16 : (0.001, 1e-5),
torch.bfloat16 : (0.016, 1e-5),


@unittest.skipIf(not TEST_NUMPY, "Numpy not found")
@dtypes(torch.complex64, torch.complex128)
def test_maximum_minimum_complex(self, device, dtype):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comments: should we support complex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases for complex numbers have been updated.
The current test cases are to whether the error messages are raised when either input is a complex number.

.. note::
If one of the elements being compared is a NaN, then that element is returned.
If both elements are NaNs then the first is returned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the note may be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The If both elements are NaNs then the first is returned. is removed and the whole note is changed like minimum function.
Please kindly confirm.

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice teamwork!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

Just waiting on resolution of pytorch/xla#2424 now.

@JackCaoG
Copy link
Collaborator

Hi @RockingJavaBean , is it possible to rebase this branch with the pytorch master? I submitted pytorch/xla#2438 to resolve the xla issue and pin to this pr as the pytorch version. However, #43047 is missing from this branch and the xla_build will fail. I patched this pr and tested my change locally and everything seems to be fine, but it is still better to let circleCi to do its job.

@RockingJavaBean
Copy link
Contributor Author

@JackCaoG thanks for helping out with the XLA issue.
I have rebased with the PyTorch master, and now the commit of #43047 is included in this branch.
Please kindly let Circle CI retest the xla_build.

@JackCaoG
Copy link
Collaborator

@RockingJavaBean Thanks! I saw all test passed on XLA:CPU but some failed on XLA:GPU, this is most likely due to a different handling of nan in XLA:GPU. I will check those tomorrow.

@mruberry
Copy link
Collaborator

Thanks for helping out, @JackCaoG!

@RockingJavaBean
Copy link
Contributor Author

RockingJavaBean commented Aug 24, 2020

@mruberry The assertRaises is removed in the test for the type promotion between half and bfloat16 via 965b236, as it has been implemented in #43324.
Please kindly confirm and help land this PR.
It may also help @JackCaoG lands the related xla PR pytorch/xla#2424.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 033b7ae.

@mruberry
Copy link
Collaborator

Great work all! Especially @ailzhang and @JackCaoG for taking the time to help us out with PyTorch/XLA.

@RockingJavaBean, let me know if you'd like some suggestions for other PyTorch issues or features.

@RockingJavaBean RockingJavaBean deleted the numpy_maximum_minimum branch August 27, 2020 16:23
@RockingJavaBean
Copy link
Contributor Author

@mruberry I’m so grateful for your kind help and patient guidance throughout this PR.
I love to have your suggestions for other issues or features and continue to make contributions to the PyTorch.

@mruberry
Copy link
Collaborator

mruberry commented Aug 28, 2020

@mruberry I’m so grateful for your kind help and patient guidance throughout this PR.
I love to have your suggestions for other issues or features and continue to make contributions to the PyTorch.

Happy to help you find another project, @RockingJavaBean. Would you like something simpler or more challenging, another function like maximum and minimum or something different, like a function that would use existing PyTorch functions to manipulate tensors, or a function with more challenging numeric and/or performance issues?

If you could suggest what you're interested in it'll help me think of a fun project.

@RockingJavaBean
Copy link
Contributor Author

@mruberry thank you so much for the advice, I would like to try more challenging issues.
Please suggest the numeric or performance issues that I can start to work on.

@mruberry
Copy link
Collaborator

@mruberry thank you so much for the advice, I would like to try more challenging issues.
Please suggest the numeric or performance issues that I can start to work on.

OK, let me suggest a project and think about whether this would be interesting.

In PyTorch we have torch.linalg.outer, which is an alias for torch.ger, a function which computes the outer product of two vectors. The CPU implementation for this is very old. So old, in fact, that it's in the TH tensor library (see

void THBlas_(ger)(
), not the more modern ATen tensor library.

There are several challenges with this function:

  • removing the TH code and moving the function's implementation to ATen
  • improving the test coverage
  • optimizing its performance
  • extending it to complex dtypes

These could even be done in 2-3 PRs (handling complex dtypes should definitely be its own PR, the other issues could be combined).

This would require benchmarking the current BLAS-based solution with a new alternative (like reusing matmul) on various sizes and dtypes, as well as understanding the current code while writing a new, possibly faster implementation.

I think this is a really interesting project, but if it's not what you had in mind let me know and I can suggest something else. The goal here would be to demonstrate a faster implementation for PyTorch's CPU outer product and modernize the code. Follow-up work could include handling complex vectors or reviewing the CUDA outer product implementation.

@RockingJavaBean
Copy link
Contributor Author

RockingJavaBean commented Aug 28, 2020

@mruberry thanks so much for the detailed suggestions, it sounds really interesting to me.
I am going to read the TH-to-ATen-porting-guide for porting the outer product first.
Then I would try to improve the test coverage and performance, as well as handling the complex types in the follow-up PRs.

@mruberry
Copy link
Collaborator

@mruberry thanks so much for the detailed suggestions, it sounds really interesting to me.
I am going to read the TH-to-ATen-porting-guide for porting the outer product first.
Then I would try to improve the test coverage and performance, as well as handling the complex types in the follow-up PRs.

Sounds good. Please let me know if you need more pointers or additional help.

titaiwangms added a commit to microsoft/onnxscript that referenced this pull request Jun 13, 2025
Issue revealed by #2371,
which aten.max.other is lack of matching overload. It's caused by
missing type promotion. The reason is that aten::max.other (binary max)
is an alias of aten::maimum.default. Thus, iwhen type promotion pass
dispatches torch.max through `__torch__dispatch__`, it does not find
aten::max.other (However, I am not sure how `make_fx` dispatches
torch.max to aten::max.other).

The existence of aten::max.other looks like a legacy code:
pytorch/pytorch#42579.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: numpy Related to numpy support, and also numpy compatibility of our operators open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants