fix pca_lowrank memory consumption #40853

ngimel · 2020-07-01T06:22:18Z

Per title, fixes #40768

dr-ci · 2020-07-01T07:25:23Z

💊 CI failures summary and remediations

As of commit 7184385 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_backward_compatibility_check_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 01 07:19:25 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.

Jul 01 07:19:25 processing existing schema:  __str__(__torch__.torch.classes._TorchScriptTesting._StackString _0) -> (str _0) 
Jul 01 07:19:25 processing existing schema:  __init__(__torch__.torch.classes._TorchScriptTesting._PickleTester _0, int[] _1) -> (None _0) 
Jul 01 07:19:25 processing existing schema:  __getstate__(__torch__.torch.classes._TorchScriptTesting._PickleTester _0) -> (int[] _0) 
Jul 01 07:19:25 processing existing schema:  __setstate__(__torch__.torch.classes._TorchScriptTesting._PickleTester _0, int[] _1) -> (None _0) 
Jul 01 07:19:25 processing existing schema:  top(__torch__.torch.classes._TorchScriptTesting._PickleTester _0) -> (int _0) 
Jul 01 07:19:25 processing existing schema:  pop(__torch__.torch.classes._TorchScriptTesting._PickleTester _0) -> (int _0) 
Jul 01 07:19:25 processing existing schema:  get(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0, Tensor _1) -> (str _0) 
Jul 01 07:19:25 processing existing schema:  __getstate__(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0) -> (int _0) 
Jul 01 07:19:25 processing existing schema:  __setstate__(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0, int _1) -> (None _0) 
Jul 01 07:19:25 processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (None _0) 
Jul 01 07:19:25 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.  
Jul 01 07:19:25  
Jul 01 07:19:25 Broken ops: [ 
Jul 01 07:19:25 	aten::view_as(Tensor self, Tensor other) -> (Tensor) 
Jul 01 07:19:25 	aten::reshape_as(Tensor self, Tensor other) -> (Tensor) 
Jul 01 07:19:25 	aten::pin_memory(Tensor self) -> (Tensor) 
Jul 01 07:19:25 	aten::reshape(Tensor self, int[] shape) -> (Tensor) 
Jul 01 07:19:25 	aten::detach(Tensor self) -> (Tensor) 
Jul 01 07:19:25 	aten::expand_as(Tensor self, Tensor other) -> (Tensor) 
Jul 01 07:19:25 	aten::flatten.DimnameList(Tensor self, str[] dims, str out_dim) -> (Tensor) 
Jul 01 07:19:25 	aten::flatten.named_out_dim(Tensor self, int start_dim, int end_dim, str out_dim) -> (Tensor)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 1 time.

pearu

LGTM, thanks for this simplification!

pearu

Suggestion to use stride tricks to avoid creating possible large temporary tensors.

pearu · 2020-07-02T12:47:02Z

torch/_lowrank.py

+        C = A.mean(dim=(-2,), keepdim=True)
+        return _svd_lowrank(A - C, q, niter=niter, M=None)


Could you change this code to:

C = A.mean(dim=(-2,), keepdim=True) M_strides = [C.stride(i) for i in range(len(C.shape))] M_strides[-2] = 0 M = C.as_strided(A.shape, M_strides) return _svd_lowrank(A, q, niter=niter, M=M)

that will avoid creating a possibly large A - C.

Will do that, but I suspect it will just postpone the inevitable - when matmul sees discontiguous arguments, it blow them up. Will have to double check matmul code, but I vaguely remember it from when I looked at it previously.

Do you ever backprop through pca_lowrank? If not, and A is otherwise not needed, A-C can be computed inplace, avoiding large memory allocation.

Will do that, but I suspect it will just postpone the inevitable - when matmul sees discontiguous arguments, it blow them up

I tested the code locally, all pca tests passed ok.

Yeah, sorry, by "blow up" I meant it will materialize expanded contiguous tensors. So yeah, the tests will pass, but memory usage is possibly still suboptimal.

a memory is contiguous, but tensor a is not contiguous. In contiguous tensors, no 2 logically different elements point to the same memory location, whereas in a all elements point to the same memory location.

Right, the whole point of expand-contiguous conversion is to ensure that tensor data is C-contiguous and can be passed to some low-level routine that is not stride-aware, that is, the low-level routine would access the data memory according to the shape information, not strides. That said, matmul low-level routines ought to be stride-aware (thinking of LAPACK) so that the expand-contiguous conversion is unnecessary. So there could be some ideas for matmul optimization here.

I hear what you are saying, however, typically matmul (let's talk about 2d tensors for simplicity) will be dispatched to a blas library that implements gemm API . As you can see here, lda/ldb that are providing stride-awareness are explicitly constrained to be positive, not 0. That said, instead of dispatching calls with stride=0 args to blas, doing pointwise multiplication of arguments followed by a reduction and expanding back to the necessary size is likely to be faster. Double points if multiplication could be done on the fly during reduction, so that A*B (pointwise-multiplied) did not have to be materialized, but unfortunately pytorch cannot do that, you'd need keops for that.

surprised that blas doesn't support zero stride :( Do we need to write specialized kernels for these cases?

The joys of 80's libraries for you :-) The primitive that we need here to avoid materialization is "broadcasted multiplication followed by a reduction" #32591 (comment), einsum people have been talking about it for a long time. It sounds easy but, given how different broadcasting patterns can be and how generally hard it is to write an efficient reduction for various access patterns, turns out to be pretty hard, there are libraries that are doing it to varying degrees of success (Keops and cutensor that I know of)

pearu

Approving the PR as it is because my suggestion was invalid.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-07-03T02:11:28Z

@ngimel merged this pull request in 6095808.

fix pca_lowrank memory consumption

7184385

ngimel requested review from pearu and vincentqb July 1, 2020 06:22

pearu approved these changes Jul 2, 2020

View reviewed changes

pearu requested changes Jul 2, 2020

View reviewed changes

pearu mentioned this pull request Jul 2, 2020

pca_lowrank memory allocation #40768

Closed

pearu approved these changes Jul 2, 2020

View reviewed changes

facebook-github-bot reviewed Jul 2, 2020

View reviewed changes

facebook-github-bot closed this in 6095808 Jul 3, 2020

facebook-github-bot added the merged label Jul 3, 2020

ngimel deleted the pca_lowrank branch July 6, 2020 04:18

ezyang mentioned this pull request Aug 5, 2020

Introducing Matrix exponential #40161

Closed

mruberry added the Merged label Oct 28, 2020

		C = A.mean(dim=(-2,), keepdim=True)
		return _svd_lowrank(A - C, q, niter=niter, M=None)

fix pca_lowrank memory consumption #40853

fix pca_lowrank memory consumption #40853

Uh oh!

Conversation

ngimel commented Jul 1, 2020

Uh oh!

dr-ci bot commented Jul 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_backward_compatibility_check_test (1/1)

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pearu Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dr-ci bot commented Jul 1, 2020 •

edited

Loading

pearu Jul 2, 2020 •

edited

Loading