Skip to content

Conversation

@mruberry
Copy link
Collaborator

@mruberry mruberry commented Aug 12, 2020

This PR:

  • updates div to perform true division
  • makes torch.true_divide an alias of torch.div

This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error.

@mruberry mruberry requested a review from apaszke as a code owner August 12, 2020 08:52
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 12, 2020
@dr-ci
Copy link

dr-ci bot commented Aug 12, 2020

💊 CI failures summary and remediations

As of commit 7744f2b (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

XLA failure

Job pytorch_xla_linux_bionic_py3_6_clang9_build is failing. Please create an issue with title prefixed by [PT_BREAK] in pytorch/xla and link to to this PR. If you have questions, please reach out to @ailzhang / @dlibenzi / @JackCaoG.


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 66 times.

@mruberry mruberry removed the request for review from apaszke August 13, 2020 03:39
@mruberry
Copy link
Collaborator Author

@ailzhang Looks like this change is breaking XLA because it changes torch.true_divide to be an alias for torch.div. It will also change the behavior of torch.div to always perform true division.

@ailzhang
Copy link
Contributor

cc: @JackCaoG This requires updating torch.div behavior on xla side, would you mind taking a look?

@mruberry
Copy link
Collaborator Author

cc: @JackCaoG This requires updating torch.div behavior on xla side, would you mind taking a look?

ping @JackCaoG. We'd like to get this fix in sooner rather than later.

@JackCaoG
Copy link
Collaborator

JackCaoG commented Aug 21, 2020

@mruberry Sorry, missed this email. I will take a look tomorrow. Feel free to ping me on Slack if I didn't response to these kind of failures within a day 😄 , or open an issue on pt/xla which I check more frequently.

@mruberry
Copy link
Collaborator Author

@mruberry Sorry, missed this email. I will take a look tomorrow. Feel free to ping me on Slack if I didn't response to these kind of failures within a day 😄 .

haha, thanks @JackCaoG, but it's not "next day" urgent! It's like "by next week" urgent, if that works for you? Should be a small fix.

@JackCaoG
Copy link
Collaborator

@mruberry Sounds good, I will work on it tomorrow.

@JackCaoG
Copy link
Collaborator

@mruberry Currently I am seeing

>>> t1
tensor(10)
>>> t1/2
tensor(5.)
>>> t1/=2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: result type Float can't be cast to the desired output type Long

I think the intent was to make div_ also perform the true_divide and you will get rid of this runtime error in the following pr?

@JackCaoG
Copy link
Collaborator

@mruberry Could you rebase this pr 😄 ? I pined my pr to this pr but I need #43047 to build PT/XLA.

@mruberry mruberry changed the title [WIP] Updates div to perform true division Updates div to perform true division Aug 23, 2020
@mruberry
Copy link
Collaborator Author

@mruberry Could you rebase this pr 😄 ? I pined my pr to this pr but I need #43047 to build PT/XLA.

Thanks @JackCaoG - rebased!

@JackCaoG
Copy link
Collaborator

Hey @mruberry I am running into an issue where if I do

import torch, torch_xla                                                                                                                                                                                                                                     
import torch_xla.core.xla_model as xm                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                            
device = xm.xla_device()                                                                                                                                                                                                                                    
t1 = torch.tensor(10, device=device)                                                                                                                                                                                                                        
print(torch.div(t1, 2.0))    

the two tensor version of the divide at::Tensor AtenXlaType::div(const at::Tensor& self, const at::Tensor& other) will be called instead of the sclar version at::Tensor AtenXlaType::div(const at::Tensor& self, at::Scalar other) . const at::Tensor& other would have the dtype of float64 instead of the float32. Is this expected?

@mruberry
Copy link
Collaborator Author

Hey @mruberry I am running into an issue where if I do

import torch, torch_xla                                                                                                                                                                                                                                     
import torch_xla.core.xla_model as xm                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                            
device = xm.xla_device()                                                                                                                                                                                                                                    
t1 = torch.tensor(10, device=device)                                                                                                                                                                                                                        
print(torch.div(t1, 2.0))    

the two tensor version of the divide at::Tensor AtenXlaType::div(const at::Tensor& self, const at::Tensor& other) will be called instead of the sclar version at::Tensor AtenXlaType::div(const at::Tensor& self, at::Scalar other) . const at::Tensor& other would have the dtype of float64 instead of the float32. Is this expected?

No it's not. Let me do some local testing. I thought we had tests for this. I also need to rebase this.

@mruberry
Copy link
Collaborator Author

mruberry commented Sep 8, 2020

@JackCaoG I believe I fixed the issue you were seeing. This PR is also rebased - sorry there was a holdup.

@mruberry mruberry requested a review from gchanan September 8, 2020 15:12
- func: true_divide.Tensor(Tensor self, Tensor other) -> Tensor
use_c10_dispatcher: full
variants: function, method
dispatch:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does setting dispatch override the gradient? Check what test would catch this.

Delete dispatches.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting follow-up: probably want to add gradcheck to test_op_aliases, method_test entries missing for true_divide and floor_divide.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly dispatching was causing autograd to fail. Deleting the dispatches works. Unfortunately aliases can't be added to method_tests() without additional test changes because test_jit.py has a check where it looks for the node, by name, but the node doesn't appear in the JIT's graph (because it's been aliased to another op). In the near future aliases will be refactored into OpInfos and gradcheck'd automatically there.

Performs a "true" division like Python 3. See :func:`torch.floor_divide`
for floor division.
Supports :ref:`broadcasting to a common shape <broadcasting-semantics>`,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: check if scalar semantics are covered

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broadcasting note DOES NOT describe this, and the note is in need of an udpate. I added a note to our Docs Hackathon planner.

@JackCaoG
Copy link
Collaborator

JackCaoG commented Sep 8, 2020

@mruberry I am still seeing the same behavior, running

import torch, torch_xla                                                                                                                                                                                                         
import torch_xla.core.xla_model as xm                                                                                                                                                                                           
                                                                                                                                                                                                                                
device = xm.xla_device()                                                                                                                                                                                                        
t1 = torch.tensor(10, device=device)                                                                                                                                                                                            
print(torch.div(t1, 2.0))   

give me

div two tensor
div two tensor
tensor(5., device='xla:0', dtype=torch.float64)

I am using the most recent div_true and I also rebased my pt/xla branch

(pytorch2) jackcao@jackcao64:~/Desktop/gitProj2/pytorch$ git status
On branch div_true
Your branch is up to date with 'origin/div_true'.

Could you give it another look?

@mruberry
Copy link
Collaborator Author

mruberry commented Sep 9, 2020

@mruberry I am using the pytorch head against the pytorch/xla head to run the same code and I am seeing the same behavior.

torch.div(t1, 2.0)

will call AtenXlaType::div(const at::Tensor& self, const at::Tensor& other) instead of AtenXlaType::div(const at::Tensor& self, at::Scalar other). This behavior is unrelated to this pr, but it does seems wrong.

I agree with you but I'm glad we've verified it's not a regression. Want to file a follow-up issue on the PyTorch/XLA side?

How does this PR look otherwise?

@JackCaoG
Copy link
Collaborator

JackCaoG commented Sep 9, 2020

@mruberry I am using the pytorch head against the pytorch/xla head to run the same code and I am seeing the same behavior.

torch.div(t1, 2.0)

will call AtenXlaType::div(const at::Tensor& self, const at::Tensor& other) instead of AtenXlaType::div(const at::Tensor& self, at::Scalar other). This behavior is unrelated to this pr, but it does seems wrong.

I agree with you but I'm glad we've verified it's not a regression. Want to file a follow-up issue on the PyTorch/XLA side?

How does this PR look otherwise?

This pr looks good. I need to fix/workaround the type promotion issue caused by the scalar->tensor call.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

}
}

TensorIterator TensorIterator::binary_op(Tensor& out, const Tensor& a,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary_op_float

('div', (S, S, S), (uniform_scalar(0.1),), 'scalar_broadcast_rhs', (True,)),
('div', (), (uniform_scalar(0.1),), 'scalar_broadcast_lhs', (True,)),
('div', torch.rand(S, S, S) + 1e-1, (3.14,), 'constant', (True,)),
('div', uniform_scalar(1e-1, requires_grad=True), (3.14,), 'scalar_constant', (True,)),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add true_divide and ignore in jit (for now) with comment

self.assertEqual(div_result, dividend.clone().true_divide_(divisor))
self.assertEqual(dividend.clone().div_(2), dividend.clone().true_divide_(2))
def test_div_promotion_inplace(self, device, dtype):
for op in (torch.Tensor.div_, torch.Tensor.true_divide_):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nicer if you could just assert div_ is true_divide_ eh?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be. We could still set the functions equal in Python and C++.

@mruberry
Copy link
Collaborator Author

@JackCaoG I was planning to land this on Monday, that OK with you? I can start the land in the morning and I would expect it to land before noon.

@JackCaoG
Copy link
Collaborator

@JackCaoG I was planning to land this on Monday, that OK with you? I can start the land in the morning and I would expect it to land before noon.

Sounds good

@mruberry
Copy link
Collaborator Author

@JackCaoG I was planning to land this on Monday, that OK with you? I can start the land in the morning and I would expect it to land before noon.

Sounds good

Thanks @JackCaoG!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator Author

@JackCaoG This is merged (finally). Sorry it's later than expected due to some internal test slowdowns today.

@JackCaoG
Copy link
Collaborator

@mruberry thanks for the reminder, I merged xla side change too.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 686e281.

xuzhao9 pushed a commit that referenced this pull request Sep 18, 2020
Summary:
This PR:

- updates div to perform true division
- makes torch.true_divide an alias of torch.div

This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error.

Pull Request resolved: #42907

Reviewed By: ngimel

Differential Revision: D23622114

Pulled By: mruberry

fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927
@facebook-github-bot facebook-github-bot deleted the div_true branch January 27, 2021 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants