Expose raw saved tensors for custom functions #60551

Varal7 · 2021-06-23T16:53:05Z

Stack from ghstack:

Summary: This PR is part of a stack aimed to address #58512.
Provide bindings for the SavedVariable class (for custom functions only). The SavedTensor object can not be created from Python and currently does not expose any method.
Undefined behavior can arise if the user decides to keep a reference to this SavedTensor after the underlying SavedVariable has been freed.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D29466228

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

facebook-github-bot · 2021-06-23T16:53:11Z

💊 CI failures summary and remediations

As of commit a7ccff0 (more details on the Dr. CI page and at hud.pytorch.org/pr/60551):

3/3 failures possibly* introduced in this PR
- 1/3 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Jun 29 20:29:46 ModuleNotFoundError: No module named 'expecttest'

Jun 29 20:29:45 + [[ -n '' ]]
Jun 29 20:29:45 + [[ pytorch-linux-xenial-py3.6-gcc5.4-build == *xla* ]]
Jun 29 20:29:45 + [[ pytorch-linux-xenial-py3.6-gcc5.4-build != *libtorch* ]]
Jun 29 20:29:45 + [[ pytorch-linux-xenial-py3.6-gcc5.4-build != *bazel* ]]
Jun 29 20:29:45 + python test/run_test.py --export-past-test-times
Jun 29 20:29:46 Traceback (most recent call last):
Jun 29 20:29:46   File "test/run_test.py", line 18, in <module>
Jun 29 20:29:46     from torch.testing._internal.common_utils import FILE_SCHEMA, IS_IN_CI, TEST_WITH_ROCM, shell, set_cwd
Jun 29 20:29:46   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 48, in <module>
Jun 29 20:29:46     import expecttest
Jun 29 20:29:46 ModuleNotFoundError: No module named 'expecttest'
Jun 29 20:29:47 + cleanup
Jun 29 20:29:47 + retcode=1
Jun 29 20:29:47 + set +x
Jun 29 20:29:47 =================== sccache compilation log ===================
Jun 29 20:29:47 =========== If your build fails, please take a look at the log above for possible reasons ===========
Jun 29 20:29:47 Compile requests                    5065
Jun 29 20:29:47 Compile requests executed           4711
Jun 29 20:29:47 Cache hits                          4691
Jun 29 20:29:47 Cache hits (C/C++)                  4691
Jun 29 20:29:47 Cache misses                           2

pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Jun 29 20:47:05 ModuleNotFoundError: No module named 'expecttest'

Jun 29 20:47:04 + [[ pytorch-xla-linux-bionic-py3.6-clang9-build != *xla* ]]
Jun 29 20:47:04 ~/workspace
Jun 29 20:47:04 + [[ pytorch-xla-linux-bionic-py3.6-clang9-build != *libtorch* ]]
Jun 29 20:47:04 + [[ pytorch-xla-linux-bionic-py3.6-clang9-build != *bazel* ]]
Jun 29 20:47:04 + python test/run_test.py --export-past-test-times
Jun 29 20:47:05 Traceback (most recent call last):
Jun 29 20:47:05   File "test/run_test.py", line 18, in <module>
Jun 29 20:47:05     from torch.testing._internal.common_utils import FILE_SCHEMA, IS_IN_CI, TEST_WITH_ROCM, shell, set_cwd
Jun 29 20:47:05   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 48, in <module>
Jun 29 20:47:05     import expecttest
Jun 29 20:47:05 ModuleNotFoundError: No module named 'expecttest'
Jun 29 20:47:05 + cleanup
Jun 29 20:47:05 + retcode=1
Jun 29 20:47:05 + set +x
Jun 29 20:47:05 =================== sccache compilation log ===================
Jun 29 20:47:05 =========== If your build fails, please take a look at the log above for possible reasons ===========
Jun 29 20:47:05 Compile requests                      0
Jun 29 20:47:05 Compile requests executed             0
Jun 29 20:47:05 Cache hits                            0
Jun 29 20:47:05 Cache misses                          0
Jun 29 20:47:05 Cache timeouts                        0

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 366f1d3 Pull Request resolved: #60551

torch/csrc/autograd/python_function.cpp

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

soulitzer

Looks ok, the main issue I think is how to deal with the case when backward already frees the SavedVariable. Maybe theres some way to do this in pybind, but if we use the plain old PyObject struct, we can have it hold a weak_ptr and be able to raise an error if c++ side is already freed

torch/csrc/autograd/init.cpp

torch/_C/_autograd.pyi

torch/csrc/autograd/python_function.cpp

Varal7 · 2021-06-23T21:07:30Z

Looks ok, the main issue I think is how to deal with the case when backward already frees the SavedVariable. Maybe theres some way to do this in pybind, but if we use the plain old PyObject struct, we can have it hold a weak_ptr and be able to raise an error if c++ side is already freed

Thanks for the review!
Yes the behavior after #60565 is that accessing the raw SavedVariable after it's freed will raise an error (https://github.com/pytorch/pytorch/pull/60565/files#diff-1fdfd061a92957fd11edbaceb538f7186f185b9fb22b7219107b007f7df557fbR750)
The error could be a tiny bit more explicit (in this case we know the user is explicitly trying to access the saved variable, and not trying to backward, but the current error message already mentions that it could be they are trying to access the saved variable).

I can rebase to include it in this PR.

soulitzer · 2021-06-23T22:09:41Z

Np!
Ahh yes, that error is probably good to include as well in this PR for completeness.
The situation I was thinking about is slightly different, i.e., if the user calls raw_saved_tensors and saves a reference to a SavedVariable, then calls backward.

albanD · 2021-06-23T22:22:14Z

Removing myself as @soulitzer is handling this one.
You can add me back if you have a specific question for me!

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Varal7 · 2021-06-24T17:24:34Z

The situation I was thinking about is slightly different, i.e., if the user calls raw_saved_tensors and saves a reference to a SavedVariable, then calls backward.

Good point. My proposal is the following.
Because we only (and only plan to) expose the method SavedVariable::register_hooks, we can check there that the variable has not already been freed.
So if the user keeps a reference to the raw_saved_tensors after he has called backward he will just hold a useless object and will have a nice error message if he tries to call register_hook on it.
Like so: https://github.com/pytorch/pytorch/pull/60663/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030R156

soulitzer · 2021-06-24T17:45:02Z

That's a good plan - but there are also some cases where the entire SavedVariable is destroyed no (not just its data field)?
For example all custom functions (we could add a test for this here), and some codegen'd functions store saved variable in a vector, and on release_variables, the vector is cleared.

Varal7 · 2021-06-24T21:34:01Z

While I do agree with you, I've written tests to test those cases (here and here).

What is confusing to me is that the following code does trigger RuntimeError: Calling register_hook on a saved tensor after it has been freed. as we would want.

class MyFn(Function):
    @staticmethod
    def forward(ctx, x):
        ctx.save_for_backward(x)
        return x

    @staticmethod
    def backward(ctx, g):
        return g, None

y = MyFn.apply(torch.ones(5, requires_grad=True))
t = y.grad_fn.raw_saved_tensors
y.sum().backward() # calls grad_fn->release_variables() and the destructors for each SavedVariable (but not `.reset()`)
t[0].register_hooks(lambda x: x, lambda x: x)

soulitzer · 2021-06-25T15:35:58Z

It could be because vector.clear just updates its size, but capacity is required to stays the same, and thus the memory is still reserved by the vector.
Maybe we should tell the vector shrink_to_fit, to actually release the memory (though technically that is still not guaranteed).
Could be interesting as well to see if this improves memory usage too

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 06d3801 Pull Request resolved: pytorch#60551

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

soulitzer

LGTM

torch/csrc/autograd/init.cpp

albanD

Just a couple of small comments.

test/test_autograd.py

torch/csrc/autograd/python_function.cpp

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Varal7 · 2021-06-29T19:17:22Z

@Varal7 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D29466228](https://our.internmc.facebook.com/intern/diff/D29466228) [ghstack-poisoned]

Varal7 · 2021-06-29T20:30:13Z

@Varal7 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-30T00:23:19Z

This pull request has been merged in f54290f.

Expose raw saved tensors for custom functions

8e289f8

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Varal7 requested review from albanD and soulitzer as code owners June 23, 2021 16:53

facebook-github-bot added the cla signed label Jun 23, 2021

Varal7 added a commit that referenced this pull request Jun 23, 2021

Expose raw saved tensors for custom functions

96f9095

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 366f1d3 Pull Request resolved: #60551

Varal7 mentioned this pull request Jun 23, 2021

Expose raw saved tensors for codegen functions #60565

Closed

Varal7 commented Jun 23, 2021

View reviewed changes

torch/csrc/autograd/python_function.cpp Show resolved Hide resolved

Varal7 commented Jun 23, 2021

View reviewed changes

torch/csrc/autograd/python_function.cpp Show resolved Hide resolved

Varal7 commented Jun 23, 2021

View reviewed changes

torch/csrc/autograd/python_function.cpp Outdated Show resolved Hide resolved

Update on "Expose raw saved tensors for custom functions"

9bf0e27

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

soulitzer reviewed Jun 23, 2021

View reviewed changes

albanD removed their request for review June 23, 2021 22:21

Update on "Expose raw saved tensors for custom functions"

bbb0648

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Varal7 mentioned this pull request Jun 24, 2021

Register Saved Tensors hooks #60663

Closed

Varal7 requested a review from soulitzer June 24, 2021 17:43

Varal7 mentioned this pull request Jun 24, 2021

Add noop register hook #60685

Closed

Update on "Expose raw saved tensors for custom functions"

50df863

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Varal7 added a commit to Varal7/pytorch that referenced this pull request Jun 28, 2021

Expose raw saved tensors for custom functions

24eba9d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 06d3801 Pull Request resolved: pytorch#60551

Update on "Expose raw saved tensors for custom functions"

0ebdbc0

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

soulitzer approved these changes Jun 29, 2021

View reviewed changes

soulitzer reviewed Jun 29, 2021

View reviewed changes

torch/csrc/autograd/init.cpp Show resolved Hide resolved

albanD reviewed Jun 29, 2021

View reviewed changes

test/test_autograd.py Outdated Show resolved Hide resolved

torch/csrc/autograd/python_function.cpp Outdated Show resolved Hide resolved

Varal7 mentioned this pull request Jun 29, 2021

Add hooks to SavedVariables #58512

Closed

Update on "Expose raw saved tensors for custom functions"

6ba45ba

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Varal7 mentioned this pull request Jun 29, 2021

Apply saved tensor hooks #60975

Closed

Update on "Expose raw saved tensors for custom functions"

a7ccff0

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D29466228](https://our.internmc.facebook.com/intern/diff/D29466228) [ghstack-poisoned]

facebook-github-bot closed this in f54290f Jun 30, 2021

facebook-github-bot added the Merged label Jun 30, 2021

facebook-github-bot deleted the gh/varal7/11/head branch July 3, 2021 14:17

Varal7 mentioned this pull request Aug 20, 2021

Raw saved tensors can survive the deletion of the underlying SavedVariable object #63624

Open

Expose raw saved tensors for custom functions #60551

Expose raw saved tensors for custom functions #60551

Uh oh!

Conversation

Varal7 commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

soulitzer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Varal7 commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soulitzer commented Jun 23, 2021

Uh oh!

albanD commented Jun 23, 2021

Uh oh!

Varal7 commented Jun 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soulitzer commented Jun 24, 2021

Uh oh!

Varal7 commented Jun 24, 2021

Uh oh!

soulitzer commented Jun 25, 2021

Uh oh!

soulitzer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Varal7 commented Jun 29, 2021

Uh oh!

Varal7 commented Jun 29, 2021

Uh oh!

facebook-github-bot commented Jun 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Varal7 commented Jun 23, 2021 •

edited

Loading

facebook-github-bot commented Jun 23, 2021 •

edited

Loading

Varal7 commented Jun 23, 2021 •

edited

Loading

Varal7 commented Jun 24, 2021 •

edited

Loading