Update functorch supported autograd.Function to allow mark_dirty #91222

soulitzer · 2022-12-21T01:30:33Z

Stack from ghstack (oldest at bottom):

-> Update functorch supported autograd.Function to allow mark_dirty #91222

Fixes #90225
Uses what was originally in #89860

[ghstack-poisoned]

pytorch-bot · 2022-12-21T01:30:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91222

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 15e8773:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

soulitzer · 2022-12-21T01:32:10Z

test/functorch/test_ops.py


        xfail('nn.functional.rrelu'),  # in-place test errors out with no formula implemented
-        xfail('NumpyExpMarkDirtyAutogradFunction'),  # https://github.com/pytorch/pytorch/issues/90225
+        xfail('NumpyExpMarkDirtyAutogradFunction'),  # TODO: calling in-place operation that would mutate a captured Tensor


This errors for a different reason now, need to investigate.

@zou3519 I think I figured out what the issue was with this, but not sure what the solution is

In cpp functorch:

we create dual tensor with tangent that is captured (so it has a immutable wrapper)

when we call into exp_, the checkForInvalidMutationOnCaptures does not care about the tangent having a immutable wrapper, because that is hidden by the dual tensors wrapper which isn't immutable.

before we call into VariableType, we first exclude the dynamicLayerFront

forward AD formula for inplace ops does tangent.copy_(new_tangent). Because we already excluded dynamicLayerFront, we just go into VariableType again (which is basically a noop since neither tensor has tangent this time around)

In pyfunctorch

we still create that immutable wrapper for tangent

when we call process, we do not exclude dynamicLayerFront

process constructs the single layer autograd Function and calls apply (which calls into forward, then jvp)

after forward is done (with no problems), jvp is performed, which does tangent.mul_(output). At this point, JvpTransform is still at the stop of the stack.

since we did not exclude this time, we go into dynamicLayerFront this time, which now errors due to checkForInvalidMutationOnCaptures because now we're performing an in-place op on the tangent which has the immutable wrapper.

click for repro

from functorch import vmap, jvp import torch import numpy as np def to_numpy(tensor): return tensor.cpu().numpy() class NumpyMul(torch.autograd.Function): @staticmethod def forward(x, y): return torch.tensor(to_numpy(x) * to_numpy(y), device=x.device) @staticmethod def setup_context(ctx, inputs, outputs): ctx.save_for_backward(*inputs) ctx.save_for_forward(*inputs) @staticmethod def backward(ctx, grad_output): x, y = ctx.saved_tensors gx = None if ctx.needs_input_grad[0]: gx = NumpyMul.apply(grad_output, y) gy = None if ctx.needs_input_grad[1]: gy = NumpyMul.apply(grad_output, x) return gx, gy @staticmethod def vmap(info, in_dims, x, y): x_bdim, y_bdim = in_dims x = x.movedim(x_bdim, -1) if x_bdim is not None else x.unsqueeze(-1) y = y.movedim(y_bdim, -1) if y_bdim is not None else y.unsqueeze(-1) result = NumpyMul.apply(x, y) result = result.movedim(-1, 0) return result, 0 @staticmethod def jvp(ctx, x_tangent, y_tangent): x, y = ctx.saved_tensors return x_tangent * y + y_tangent * x class NumpyExp_(torch.autograd.Function): @staticmethod def forward(x): x_np = to_numpy(x) np.exp(x_np, x_np) return x @staticmethod def setup_context(ctx, inputs, outputs): x, = inputs ctx.mark_dirty(x) ctx.save_for_backward(outputs) ctx.save_for_forward(outputs) @staticmethod def backward(ctx, grad_output): output, = ctx.saved_tensors return NumpyMul.apply(grad_output, output) @staticmethod def vmap(info, in_dims, x): NumpyExp_.apply(x) return x, in_dims[0] @staticmethod def jvp(ctx, x_tangent): output, = ctx.saved_tensors x_tangent.mul_(output) return x_tangent def fn(x): # return torch.exp_(x) <-- does not error return NumpyExp_.apply(x) a = torch.rand(4,) b = torch.rand(4,) with torch.autograd.function._set_autograd_function_extension_enabled(True): jvp(fn, (a,), (b,))

jvp is performed, which does tangent.mul_(output)

Where is the mul_ in the code?

in the jvp NumpExp_ defines

Maybe one solution could be:

We could just say that it is okay for us to go through process again. It is basicaly a noop, since the stack is the same. Technically it does more checks, but maybe that is fine and we actually want those checks?

Currently in the creation of a dual tensor, the primal and tangent are not explicitly wrapped, instead we rely on them to get automatically lifted. If we manually wrap tangent (and primal) instead, this error should no longer trigger even if we go through process an extra time. Since tangent is something the user passed in themselves, we should be okay with mutating it, and not mark it with the immutable wrapper.

Alternate solution (doesn't work):

I also tried excluding manually in PyFunctorch's process to mimic the cpp version but ran into an issue with unwrapped_count > 0 INTERNAL ASSERT FAILED in the dead tensor wrapper fallback and not sure what that means yet. (What does this mean?)

I'm still processing what is going on, but let me reply to your questions:

I also tried excluding manually in PyFunctorch's process to mimic the cpp version but ran into an issue with unwrapped_count > 0 INTERNAL ASSERT FAILED in the dead tensor wrapper fallback and not sure what that means yet. (What does this mean?)

There's an invariant that a Tensor with a FuncTorchTensorWrapper dispatch key must be a TensorWrapper. Given that we hit the dead_tensor_fallback, then at least one of the inputs must be a TensorWrapper. The assertion is complaining that none of the inputs are TensorWrapper

The thing I am struggling a bit with right now is, does the in-place mutation check even make sense for forward-mode AD?

if it does, then it sounds like C++ functorch is wrong because it bypasses it

if it doesn't, then to what extent can we just get rid of it from C++ and Python functorch?

Claim: the input-mutation check makes sense for forward-mode AD. We want to prevent a situation where the dual tensor is created on the wrong TensorWrapper.

There are two cases here:

Case 1: captured value mutated in-place. If we have:

y = torch.tensor(1.) def f(x): y.copy_(x) return x + y jvp(f, (x,), (t,)

Then the dual should be created on the wrapped version of y, not y itself. The in-place error checks should ideally raise an error in this situation.

Case 2: tangent tensor mutated in-place (which is what is happening in this PR).

import torch import torch.autograd.forward_ad as fwAD x = torch.tensor(2.) y = torch.tensor(3.) with fwAD.dual_level(): x_dual = fwAD.make_dual(x, y) y.copy_(x_dual) x, x_tangent = fwAD.unpack_dual(x_dual)

If we ran the functorch.jvp equivalent of the above, it's important that the tangent of x is a TensorWrapper, because it ends up getting its own tangent value.

Solution?

Given the above, I like one of the solutions you proposed above, which is:

Currently in the creation of a dual tensor, the primal and tangent are not explicitly wrapped, instead we rely on them to get automatically lifted. If we manually wrap tangent (and primal) instead, this error should no longer trigger even if we go through process an extra time. Since tangent is something the user passed in themselves, we should be okay with mutating it, and not mark it with the immutable wrapper.

functorch.jvp should wrap the primal and the tangent before calling make_dual. The end state is that we get TensorWrapper(primal) that has a tangent which is TensorWrapper(tangent).

Thoughts? Also, thank you for the detailed analysis, it saved me from stepping through the code in gdb

Case 2 is actually a bug in PyTorch forward mode AD. Even if we make primal and tangent both have TensorWrapper at the same level, the tangent's TensorWrapper should itself never have a tangent. Normally we'd error if we're setting a tangent that itself has a tangent, but we're getting around that check with an in-place lol.

I think that morally tangent should not be wrapped at the same level as the primal (I see the tangent as being metadata that lives on the primal's wrapper, so in a sense it should be subordinate to the primal). Tangent is being wrapped today because we are computing with it while JVP is active, in theory we are only computing with plain tensors at that point, so (if the forward/backward AD kernels were separate) we should be able exclude Autograd key and properly unwrap and pop JVP off the stack before computing forward grads.

That being said, I still think that it is a good idea to manually wrap tangent at the same level as primal today to indicate that it is a tensor explicitly passed in so that its AD metadata isn't immutable.

torch/csrc/autograd/python_function.cpp

…_dirty" Uses what was originally in #89860 [ghstack-poisoned]

…_dirty" Fixes #90225 Uses what was originally in #89860 [ghstack-poisoned]

ghstack-source-id: c99c1fc Pull Request resolved: #91222

zou3519

LGTM, some minor comments. I assume we are punting the handling of the TODO to the future (but feel free to dig into it more if you're interested)

torch/_functorch/autograd_function.py

torch/testing/_internal/autograd_function_db.py

zou3519 · 2022-12-22T22:36:41Z

test/functorch/test_ops.py

        # skip because this is flaky depending on what the max_norm is!
        skip('nn.functional.embedding', ''),
        skip('to'),  # RuntimeError: required rank 4 tensor to use channels_last format
+        xfail('NumpyExpMarkDirtyAutogradFunction'),  # vmap: inplace into a regular tensor


Just to check, this is not "calling in-place operation that would mutate a captured Tensor", right?

test/functorch/test_ops.py

…_dirty" Fixes #90225 Uses what was originally in #89860 [ghstack-poisoned]

ghstack-source-id: bca8612 Pull Request resolved: #91222

soulitzer · 2022-12-23T01:08:47Z

I assume we are punting the handling of the TODO to the future (but feel free to dig into it more if you're interested)

Yup leaving this for a follow up for now

soulitzer · 2022-12-23T01:08:53Z

@pytorchbot merge -g

pytorchmergebot · 2022-12-23T01:12:07Z

Merge failed

Reason: Not merging any PRs at the moment because there is a merge blocking https://github.com/pytorch/pytorch/labels/ci:%20sev issue open at:
#91332

Details for Dev Infra team

Raised by workflow job

albanD · 2022-12-26T17:30:02Z

torch/_functorch/autograd_function.py

-#     def setup_context(ctx, outputs, x):
-#         y = outputs
+#     def setup_context(ctx, inputs, output):
+#         y = output


Why the rename from outputs -> output? is it a single output now? Or they are unpacked?

It has always been a single output, just updating the name to reflect that. Since we are returning what the user returned from forward as-is, that can sometime be a tuple, depending on what the user returns.

In an earlier version of this PR I made it always pass in a tuple for consistency, but after discussion here #91222 (comment), I decided to revert that change.

albanD · 2022-12-26T17:30:16Z

torch/_functorch/autograd_function.py

 #     @staticmethod
-#     def setup_context(ctx, outputs, x):
-#         y = outputs
+#     def setup_context(ctx, inputs, output):


Did you actually swap the order? That wasn't reflected in the tests.

This is just an outdated comment, this is now the correct order

(see the python bindings)

soulitzer · 2022-12-27T23:50:58Z

@pytorchbot rebase

pytorchmergebot · 2022-12-27T23:53:12Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2022-12-27T23:53:18Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict gh/soulitzer/159/orig returned non-zero exit code 1

warning: skipped previously applied commit 17288ffde5
warning: skipped previously applied commit 26582f5770
hint: use --reapply-cherry-picks to include skipped commits
hint: Disable this message with "git config advice.skippedCherryPicks false"
Rebasing (1/1)
Auto-merging test/functorch/test_eager_transforms.py
Auto-merging test/functorch/test_ops.py
Auto-merging test/test_autograd.py
Auto-merging torch/_C/_functorch.pyi
CONFLICT (content): Merge conflict in torch/_C/_functorch.pyi
Auto-merging torch/_functorch/autograd_function.py
CONFLICT (content): Merge conflict in torch/_functorch/autograd_function.py
Auto-merging torch/testing/_internal/autograd_function_db.py
error: could not apply 1a9cb2038f... Update functorch supported autograd.Function to allow mark_dirty
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 1a9cb2038f... Update functorch supported autograd.Function to allow mark_dirty

Raised by https://github.com/pytorch/pytorch/actions/runs/3790618497

…_dirty" Fixes #90225 Uses what was originally in #89860 [ghstack-poisoned]

ghstack-source-id: 9590b03 Pull Request resolved: #91222

soulitzer · 2022-12-28T00:25:11Z

@pytorchbot merge -g

pytorchmergebot · 2022-12-28T00:26:57Z

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Update functorch supported autograd.Function to allow mark_dirty

dde6c9a

[ghstack-poisoned]

soulitzer requested review from albanD and zou3519 as code owners December 21, 2022 01:30

soulitzer mentioned this pull request Dec 21, 2022

[autograd Function] Don't materialize forward grad for non-differentiable types #91183

Closed

soulitzer mentioned this pull request Dec 21, 2022

[autograd Function] Return input as-is if marked dirty even when requires_grad=False #91214

Closed

soulitzer commented Dec 21, 2022

View reviewed changes

torch/csrc/autograd/python_function.cpp Outdated Show resolved Hide resolved

soulitzer added the release notes: torch.func release notes category for torch.vmap or torch.func.* APIs label Dec 21, 2022

pytorch deleted a comment from github-actions bot Dec 21, 2022

Update on "Update functorch supported autograd.Function to allow mark…

29b76ac

…_dirty" Uses what was originally in #89860 [ghstack-poisoned]

Update on "Update functorch supported autograd.Function to allow mark…

f7e487b

…_dirty" Uses what was originally in #89860 [ghstack-poisoned]

Update on "Update functorch supported autograd.Function to allow mark…

80156ea

…_dirty" Fixes #90225 Uses what was originally in #89860 [ghstack-poisoned]

soulitzer mentioned this pull request Dec 21, 2022

custom Function that supports functorch jvp doesn't work with in-place #91280

Open

Update on "Update functorch supported autograd.Function to allow mark…

feaff46

…_dirty" Fixes #90225 Uses what was originally in #89860 [ghstack-poisoned]

Update on "Update functorch supported autograd.Function to allow mark…

4195898

…_dirty" Fixes #90225 Uses what was originally in #89860 [ghstack-poisoned]

soulitzer added a commit that referenced this pull request Dec 21, 2022

Update functorch supported autograd.Function to allow mark_dirty

c9cb8ec

ghstack-source-id: c99c1fc Pull Request resolved: #91222

zou3519 approved these changes Dec 22, 2022

View reviewed changes

Update on "Update functorch supported autograd.Function to allow mark…

4a11820

…_dirty" Fixes #90225 Uses what was originally in #89860 [ghstack-poisoned]

soulitzer added a commit that referenced this pull request Dec 23, 2022

Update functorch supported autograd.Function to allow mark_dirty

1a9cb20

ghstack-source-id: bca8612 Pull Request resolved: #91222

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 23, 2022

albanD reviewed Dec 26, 2022

View reviewed changes

soulitzer requested a review from albanD December 27, 2022 06:09

Update on "Update functorch supported autograd.Function to allow mark…

40a2c0d

…_dirty" Fixes #90225 Uses what was originally in #89860 [ghstack-poisoned]

Update on "Update functorch supported autograd.Function to allow mark…

15e8773

…_dirty" Fixes #90225 Uses what was originally in #89860 [ghstack-poisoned]

soulitzer added a commit that referenced this pull request Dec 28, 2022

Update functorch supported autograd.Function to allow mark_dirty

b4ef06d

ghstack-source-id: 9590b03 Pull Request resolved: #91222

pytorch deleted a comment from pytorch-bot bot Dec 28, 2022

pytorchmergebot added the Merged label Dec 28, 2022

pytorchmergebot closed this in 1b2ee4d Dec 28, 2022

facebook-github-bot deleted the gh/soulitzer/159/head branch June 8, 2023 18:47

Update functorch supported autograd.Function to allow mark_dirty #91222

Update functorch supported autograd.Function to allow mark_dirty #91222

Uh oh!

Conversation

soulitzer commented Dec 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91222

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Dec 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Dec 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 Dec 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Case 1: captured value mutated in-place. If we have:

Case 2: tangent tensor mutated in-place (which is what is happening in this PR).

Solution?

Uh oh!

soulitzer Dec 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

soulitzer commented Dec 23, 2022

Uh oh!

soulitzer commented Dec 23, 2022

Uh oh!

pytorchmergebot commented Dec 23, 2022

Merge failed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer commented Dec 27, 2022

Uh oh!

pytorchmergebot commented Dec 27, 2022

Uh oh!

pytorchmergebot commented Dec 27, 2022

Uh oh!

soulitzer commented Dec 28, 2022

Uh oh!

pytorchmergebot commented Dec 28, 2022

Merge started

Uh oh!

Reviewers

soulitzer commented Dec 21, 2022 •

edited

Loading

pytorch-bot bot commented Dec 21, 2022 •

edited

Loading

soulitzer Dec 21, 2022 •

edited

Loading

soulitzer Dec 22, 2022 •

edited

Loading

zou3519 Dec 22, 2022 •

edited

Loading

soulitzer Dec 23, 2022 •

edited

Loading