Add `torch.vdot` #43004

kshitij12345 · 2020-08-13T12:28:39Z

Fixes #42747

dr-ci · 2020-08-13T14:01:09Z

💊 CI failures summary and remediations

As of commit 66ee099 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 63 times.

mruberry · 2020-08-13T21:08:29Z

aten/src/ATen/native/Blas.cpp

+  // For complex dtypes.
+  dot_check(self, other);
+  return AT_DISPATCH_COMPLEX_TYPES(self.scalar_type(), "vdot", [&] {
+    Tensor result = at::empty({}, self.options());


at::empty(0, ...

This actually trips up experienced PyTorch developers all the time. If we take a look at empty:

pytorch/aten/src/ATen/native/TensorFactories.cpp

Line 102 in eb47940

Tensor empty_cpu(IntArrayRef size, const TensorOptions& options_, c10::optional<c10::MemoryFormat> optional_memory_format) {

the size comes from here:

pytorch/aten/src/ATen/native/TensorFactories.cpp

Line 121 in eb47940

int64_t nelements = prod_intlist(size);

which is computed using

pytorch/aten/src/ATen/Utils.h

Line 96 in c9a1fc2

inline int64_t prod_intlist(ArrayRef<int64_t> list) {

And if we run the following program:

std::cout << prod_intlist({}) << std::endl; // prints 1 std::cout << prod_intlist(0) << std::endl; // prints 0

it will print 1, 0. It is kinda confusing that an empty initializer list produces a tensor with one element and putting a zero in produces a tensor with no elements, but I try to remember it by recalling that zero specifies the size.

Oh wait. You're immediately populating the value into result, not resizing it (which I assumed since it's such a common pattern). You're doing everything right.

(Although what about empty(1, ... for readability?)

empty({}) produces 0-dim tensor with 1 element, empty({1}) produces 1-dim tensor with 1 element. vdot has to return 0-dim, so empty({}) is correct

aten/src/ATen/native/BlasKernel.cpp

mruberry · 2020-08-13T21:21:08Z

test/test_torch.py

+    @dtypes(torch.float, torch.double, torch.cfloat, torch.cdouble)
+    def test_vdot(self, device, dtype):
+        def compare_with_numpy_bin_op(torch_fn, np_fn, x, y, relaxed_tolerance=False):
+            if self.device_type == 'cuda':


Edited.

To fix the XLA issue and simplify the code just always do:

y_np = y.cpu().numpy()

There's no harm in calling .cpu() on a CPU tensor.

test/test_torch.py

torch/_torch_docs.py

aten/src/ATen/native/native_functions.yaml

torch/_torch_docs.py

mruberry

Overall looks really good as usual, @kshitij12345!

What about adding a derivative, like we have for dot:

pytorch/tools/autograd/derivatives.yaml

Line 395 in eb47940

- name: dot(Tensor self, Tensor tensor) -> Tensor

And updating method_tests():

pytorch/torch/testing/_internal/common_methods_invocations.py

Line 632 in eb47940

('dot', (L,), ((L,),), '', (True,)),

to test it?

@anjali411 for help with the derivative.

aten/src/ATen/native/native_functions.yaml

test/test_torch.py

mruberry · 2020-08-13T22:26:03Z

test/test_torch.py

        self.assertEqual(res1, out)

+    @dtypes(torch.float, torch.double, torch.cfloat, torch.cdouble)
+    def test_vdot(self, device, dtype):


Would you also add tests for:

dot and vdot getting arguments of different dtypes, incorrect number of dimensions, and mismatched devices (this test is a little weird to write in the device generic framework, but you can check that your current device is a cuda device and then create a tensor on that device + a cpu tensor to do it)

after the out variant is added, including mismatched out dtype

after the method variant is added, you can add a test here, too:

pytorch/test/test_torch.py

Line 19786 in eb47940

('dot', '', _medium_1d, lambda t, d: [_medium_1d(t, d)],

https://github.com/pytorch/pytorch/pull/43004/files#r470634289

mruberry · 2020-08-13T22:37:26Z

torch/_torch_docs.py

+           r"""
+vdot(x, y) -> Tensor
+
+Computes the dot product (inner product) of two tensors.


What about something like:

"Computes the dot product (inner product) of input and the complex conjugate of other."

And not having to explain in a note how vdot is distinct from dot?

I think we should include the note in the description because it brings more clarity to the description.

"Computes the dot product (inner product) of two tensors. The vdot(a, b) function handles complex numbers differently than dot(a, b). If the first argument is complex the complex conjugate of the first argument is used for the calculation of the dot product."

May still need to inline this note per @anjali411's feedback. I'll let her have final say over doc string.

@kshitij12345 nit -- can you change the (new line) formatting to:
"Computes the dot product (inner product) of two tensors. The vdot(a, b) function
handles complex numbers differently than dot(a, b). If the first argument is complex
the complex conjugate of the first argument is used for the calculation of the dot product."

Done. Thanks!

anjali411 · 2020-08-14T13:44:08Z

aten/src/ATen/native/cuda/LinearAlgebra.cu


+Tensor vdot_cuda(const Tensor& self, const Tensor& other) {
+  if (!self.is_complex()) {
+    return dot_cuda(self, other);


>>> np.vdot(np.array([2]), np.array([2+3j])) (4+6j) >>> np.vdot(np.array([2+3j]), np.array([2])) (4-6j)

maybe it's worth adding a comment that we only call dot_cuda when self is not complex because we want the above mentioned behavior.

nvm I guess we throw an error for input tensors of different dtype:

>>> torch.dot(torch.tensor([1j]), torch.tensor([2])) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: dot : expected both vectors to have same dtype, but found ComplexFloat and Long

I think we should add type promotion for dot and vdot to better align with numpy.

cc. @mruberry

Type promotion would be nice but I think it's OK if it's not in this first PR. Type promotion is a little tricky to implement today when not using TensorIterator.

Yeah. I was thinking about it. But since the plan is to use existing dot for real types, wasn't sure as to how to go about having type promotion.

Also other place where it diverges from numpy is that numpy operator supports broadcasting while dot doesn't. We can add broadcasting logic before passing the inputs to dot. What do you feel about it?

I would file follow-up issues for broadcasting + type promotion. We may even want to add architecture to support doing these things easily outside of TensorIterator.

numpy vdot does not support broadcasting, and we don't try to follow numpy dot behavior:

In [5]: np.vdot([2,3], [3]) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-5-6192cfa76e6f> in <module> ----> 1 np.vdot([2,3], [3]) <__array_function__ internals> in vdot(*args, **kwargs) ValueError: cannot reshape array of size 1 into shape (2,)

@mruberry yeah that sounds good. out of curiosity -- what's tricky about implementing type promotion when not using TensorIterator?

@kshitij12345

if (!self.is_complex()) { return dot_cuda(self, other); }

will give us the desired behavior for (real, complex) dot product when the type promotion is enabled, so we should still add a note documenting that. This is so that when the type promotion is enabled in future, we have the behavior already documented.

Type promotion challenges:

sometimes have to be careful to preserve your inputs (luckily not the case here)

compute the result type

cast inputs to the result type

validate safe casting to out

copy to out (if necessary)

write a custom test for your op's type promotion behavior

It's not the end of the world. In this case we would want to change the behavior of dot, too, so it seems separable.

aten/src/ATen/native/native_functions.yaml

anjali411

I left some minor comments. LGTM overall :)

We should add a not implemented definition in derivatives.yaml until the JAX vs tf issue is resolved.

- name: dot(Tensor self, Tensor other) -> Tensor
  self: 'not_implemented("vdot: self")'
  other: 'not_implemented("vdot: other")'

We should not throw an error for input tensors of different dtypes to be more consistent with numpy. cc. @mruberry
There will be some merge conflicts once this PR is merged #42745

kshitij12345 · 2020-08-24T16:45:24Z

Gentle ping:)

jeffdaily · 2020-08-24T17:06:13Z

ROCm CI passed with latest changes. LGTM.

mruberry · 2020-08-25T05:09:18Z

You also need to update tensors.rst like torch.rst.

kshitij12345 · 2020-08-25T14:47:12Z

You also need to update tensors.rst like torch.rst.

Thanks had missed that. Have also updated the description of the function.

facebook-github-bot

@anjali411 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@anjali411 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@anjali411 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Revert "Skips some complex tests on ROCm (pytorch#42759)" This reverts commit 55b1706. Use new cuda_to_hip_mappings.py from pytorch#43004.

Summary: Revert "Skips some complex tests on ROCm (#42759)". This reverts commit 55b1706. Use new cuda_to_hip_mappings.py from #43004. Fixes #42383 (comment) CC sunway513 Pull Request resolved: #43744 Reviewed By: glaringlee Differential Revision: D23391263 Pulled By: ngimel fbshipit-source-id: ddf734cea3ba69c24f0d79cf1b87c05cdb45ec3d

kshitij12345 · 2020-08-31T17:25:14Z

Gentle Ping :)

mruberry

Ensuring Phabricator diff is stamped.

anjali411 · 2020-09-01T13:57:21Z

@kshitij12345 thanks for the reminder! the FB tests for this PR were failing, hence it took a while. can you rebase the PR?

codecov · 2020-09-01T20:39:30Z

Codecov Report

Merging #43004 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #43004   +/-   ##
=======================================
  Coverage   69.29%   69.29%           
=======================================
  Files         379      379           
  Lines       47036    47038    +2     
=======================================
+ Hits        32592    32594    +2     
  Misses      14444    14444

Impacted Files	Coverage Δ
torch/overrides.py	`98.01% <ø> (ø)`
torch/utils/hipify/cuda_to_hip_mappings.py	`100.00% <ø> (ø)`
torch/_tensor_docs.py	`100.00% <100.00%> (ø)`
torch/_torch_docs.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a67246b...66ee099. Read the comment docs.

facebook-github-bot

@anjali411 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-09-02T16:14:02Z

@anjali411 merged this pull request in b6b5ebc.

kshitij12345 added 4 commits August 13, 2020 17:53

implement vdot support

7ee83e9

update overrides

ec7f537

add docs

5b2e1b3

add test

924ebe7

pytorchbot added the open source label Aug 13, 2020

kshitij12345 marked this pull request as draft August 13, 2020 13:27

kshitij12345 marked this pull request as ready for review August 13, 2020 13:27

gchanan requested a review from mruberry August 13, 2020 18:01

gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 13, 2020

mruberry requested a review from anjali411 August 13, 2020 20:43

anjali411 added the module: complex Related to complex number support in PyTorch label Aug 13, 2020