Skip to content

Conversation

@mruberry
Copy link
Collaborator

@mruberry mruberry commented Jul 26, 2020

See #41027.

This adds a helper to resize output to ATen/native/Resize.* and updates TensorIterator to use it. The helper throws a warning if a tensor with one or more elements needs to be resized. This warning indicates that these resizes will become an error in a future PyTorch release.

There are many functions in PyTorch that will resize their outputs and don't use TensorIterator. For example,

output.resize_({batch_size, n_output_plane, output_height, output_width});

And these functions will need to be updated to use this helper, too. This PR avoids their inclusion since the work is separable, and this should let us focus on the function and its behavior in review. A TODO appears in the code to reflect this.

@dr-ci
Copy link

dr-ci bot commented Jul 26, 2020

💊 CI failures summary and remediations

As of commit a4332dc (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 40 times.

@mruberry mruberry changed the title [WIP] Warns when TensorIterator would resize its output Warns when TensorIterator would resize its output Jul 27, 2020
@mruberry mruberry requested review from ezyang, gchanan and ngimel July 27, 2020 00:40
"shape ", output.sizes(), ", which does not match the required ",
"output shape ", shape, ".",
"This behavior is deprecated, and in a future PyTorch release outputs ",
"will not be resized unless they have zero elements.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like to tell users how to fix warnings in the warning message. Wondering if we can give more specific guidance in the case when output.numel() == shape.numel(). In this case, we could tell the user to explicitly view their output so that the warning goes away.

@ezyang
Copy link
Contributor

ezyang commented Jul 27, 2020

Is the test suite warnings clean after this change?

@ezyang
Copy link
Contributor

ezyang commented Jul 27, 2020

It seems that this will warn even in the case where no reallocation would have taken place. When I look at the issue, it seems like the concern brought up by @vadimkantorov at #41027 (comment) isn't really addressed by this PR. I am also concerned about this use case: I have seen multiple end user reports where people are purposely overallocating buffers for their outputs and then specifying them as out= arguments. If someone is doing this to avoid fragmentation, how can they avoid this warning? There isn't really any good way to do so.

@ngimel
Copy link
Collaborator

ngimel commented Jul 27, 2020

@ezyang Are users doing it for inference only? It's vert tricky to use out in training (possible only via custom AutogradFunctions)
To avoid this warning, people would need to allocate their large buffer, and then resize_ it to 0 size. Then the warning will be avoided, and the out will be resize_d to use allocated storage. It may seem like a roundabout way of achieving what currently happens, but current behavior is confusing for smaller-than-needed out and discontiguous out.

@mruberry
Copy link
Collaborator Author

Is the test suite warnings clean after this change?

No! Tests aren't clean and we could/should fix them before taking a PR like this.

I have seen multiple end user reports where people are purposely overallocating buffers for their outputs and then specifying them as out= arguments. If someone is doing this to avoid fragmentation, how can they avoid this warning?

This is actually worse than you're making it seem since in the future we would prohibit this behavior! But...

To avoid this warning, people would need to allocate their large buffer, and then resize_ it to 0 size. Then the warning will be avoided, and the out will be resize_d to use allocated storage. It may seem like a roundabout way of achieving what currently happens, but current behavior is confusing for smaller-than-needed out and discontiguous out.

Another imperfect option is to take views of a larger tensor. But as @ngimel and I discussed offline this can be challenging in some cases.

I would like to let users better express performance optimizations like this, but since they're inherently niche power user activities I think it's OK to make those users jump through one more hoop to be explicit about the behavior they want.

@ezyang
Copy link
Contributor

ezyang commented Jul 28, 2020

To avoid this warning, people would need to allocate their large buffer, and then resize_ it to 0 size. Then the warning will be avoided, and the out will be resize_d to use allocated storage

OK, this is a neat trick but I agree it works with the way things are today. We should probably document this in the warning, or provide an API for more explicitly "reserving" space in the storage without actually having it show up in the tensor (this is in line with std::vector::reserve so it seems like a reasonable api surface)

@ezyang
Copy link
Contributor

ezyang commented Jul 28, 2020

No! Tests aren't clean and we could/should fix them before taking a PR like this.

It would be nice to know the order of magnitude of splash damage here, that would inform whether or not we should fix them before taking this PR.

@mruberry
Copy link
Collaborator Author

No! Tests aren't clean and we could/should fix them before taking a PR like this.

It would be nice to know the order of magnitude of splash damage here, that would inform whether or not we should fix them before taking this PR.

I'll just fix them.

@mruberry
Copy link
Collaborator Author

mruberry commented Jul 30, 2020

@ezyang @ngimel I replaced the warning with an error (temporarily) to catch all instances of this behavior covered by our CI, and I have updated the tests as needed. There was actually a bug in quantized multiplication where the output size of the tensor was being inferred incorrectly since it didn't account for broadcasting. I fixed that issue, too.

Please take a look at the updated language and test fixes. The effect of this change on our own codebase, tests included, seems small.

@mruberry mruberry requested a review from ezyang July 30, 2020 04:30
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 2f840b1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants