autograd codegen: bump VC properly for mutable ops with no returns #133044

bdhirsh · 2024-08-08T22:51:31Z

Fixes #132014

In particular, this PR should:

(1) start generating and registering kernels to the ADInplaceOrView dispatch key, for mutable ops that have no returns (like split_with_sizes_copy.out)

(2) prevously, the codegen would loop over mutable return names to decide which tensors to codegen VC bumps for. Now, we just loop over mutable input argument names.

Stack from ghstack (oldest at bottom):

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec

[ghstack-poisoned]

pytorch-bot · 2024-08-08T22:51:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133044

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 44 New Failures

As of commit cfe4273 with merge base b040dc3 ():

NEW FAILURES - The following jobs have failed:

inductor / cuda12.6-py3.10-gcc9-sm86 / test (inductor_huggingface, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rrelu_cuda_float64
inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rrelu_cuda_float32
inductor / unit-test / cuda12.6-py3.12-gcc9-sm86 / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rrelu_cuda_float32
inductor / unit-test / cuda12.6-py3.13-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rrelu_cuda_float64
inductor / unit-test / cuda12.6-py3.13-gcc9-sm86 / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rrelu_cuda_float32
inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_amx, 2, 2, linux.8xlarge.amx) (gh)
test_ops.py::TestCommonCPU::test_variant_consistency_eager_nn_functional_rrelu_cpu_float32
inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_avx2, 2, 2, linux.10xlarge.avx2) (gh)
test_ops.py::TestCommonCPU::test_variant_consistency_eager_nn_functional_rrelu_cpu_float32
inductor-rocm / rocm6.3-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2) (gh)
test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rrelu_cuda_float32
Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/_inductor/compile_fx.py:
linux-aarch64 / linux-jammy-aarch64-py3.10 / test (default, 2, 4, linux.arm64.2xlarge) (gh)
test_ops.py::TestCommonCPU::test_variant_consistency_eager_nn_functional_rrelu_cpu_float32
linux-aarch64 / linux-jammy-aarch64-py3.10 / test (default, 3, 3, linux.arm64.m7g.4xlarge) (gh)
test_ops.py::TestCommonCPU::test_variant_consistency_eager_nn_functional_rrelu_cpu_float32
pull / linux-focal-cuda12.6-py3.10-gcc11 / test (default, 1, 5, linux.4xlarge.nvidia.gpu) (gh)
test_nn.py::TestNN::test_inplace_thnn
pull / linux-focal-cuda12.6-py3.10-gcc11 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) (gh)
functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation
pull / linux-focal-cuda12.6-py3.10-gcc11 / test (default, 5, 5, linux.4xlarge.nvidia.gpu) (gh)
dynamo/test_repros.py::ReproTestsDeviceCUDA::test_partitioner_saves_weights_for_bw_cuda
pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / test (default, 1, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
test_nn.py::TestNN::test_inplace_thnn
pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / test (default, 5, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation
pull / linux-focal-py3.13-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh)
test_nn.py::TestNN::test_inplace_thnn
pull / linux-focal-py3.13-clang10 / test (crossref, 2, 2, linux.2xlarge) (gh)
functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation
pull / linux-focal-py3.13-clang10 / test (default, 1, 5, linux.4xlarge) (gh)
test_nn.py::TestNN::test_inplace_thnn
pull / linux-focal-py3.13-clang10 / test (default, 3, 5, linux.4xlarge) (gh)
functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation
pull / linux-focal-py3.13-clang10 / test (default, 5, 5, linux.4xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_grads_are_never_inplaced_into_Adagrad_cpu_float32
pull / linux-focal-py3.13-clang10 / test (dynamo_wrapped, 2, 3, linux.2xlarge) (gh)
test_nn.py::TestNN::test_inplace_thnn
pull / linux-focal-py3.13-clang10 / test (dynamo_wrapped, 3, 3, linux.2xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_grads_are_never_inplaced_into_Adagrad_cpu_float32
pull / linux-focal-py3.9-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh)
test_nn.py::TestNN::test_inplace_thnn
pull / linux-focal-py3.9-clang10 / test (crossref, 2, 2, linux.2xlarge) (gh)
functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation
pull / linux-focal-py3.9-clang10 / test (default, 1, 5, linux.4xlarge) (gh)
test_nn.py::TestNN::test_inplace_thnn
pull / linux-focal-py3.9-clang10 / test (default, 3, 5, linux.4xlarge) (gh)
functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation
pull / linux-focal-py3.9-clang10 / test (default, 5, 5, linux.4xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_grads_are_never_inplaced_into_Adagrad_cpu_float32
pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 2, 3, linux.2xlarge) (gh)
test_nn.py::TestNN::test_inplace_thnn
pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 3, 3, linux.2xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_grads_are_never_inplaced_into_Adagrad_cpu_float32
pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6, linux.4xlarge) (gh)
test_nn.py::TestNN::test_inplace_thnn
pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6, linux.4xlarge) (gh)
functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation
pull / linux-jammy-py3.9-gcc11 / test (default, 1, 5, linux.2xlarge) (gh)
test_nn.py::TestNN::test_inplace_thnn
pull / linux-jammy-py3.9-gcc11 / test (default, 3, 5, linux.2xlarge) (gh)
test_optim.py::TestOptimRenewedCPU::test_grads_are_never_inplaced_into_Adagrad_cpu_float32
pull / linux-jammy-py3.9-gcc11 / test (default, 5, 5, linux.2xlarge) (gh)
functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation
trunk / linux-focal-rocm6.3-py3.10 / test (default, 1, 2, linux.rocm.gpu.2) (gh)
test_nn.py::TestNN::test_inplace_thnn
trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) (gh)
test_nn.py::TestNN::test_inplace_thnn
trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) (gh)
functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation
trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable) (gh)
test_optim.py::TestOptimRenewedCPU::test_grads_are_never_inplaced_into_Adagrad_cpu_float32
trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-13) (gh)
test_nn.py::TestNN::test_inplace_thnn
trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-14) (gh)
test_nn.py::TestNN::test_inplace_thnn
trunk / win-vs2022-cpu-py3 / test (default, 1, 3, lf.windows.4xlarge.nonephemeral) (gh)
test_nn.py::TestNN::test_inplace_thnn
trunk / win-vs2022-cpu-py3 / test (default, 2, 3, lf.windows.4xlarge.nonephemeral) (gh)
functorch\test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 5fc326e Pull Request resolved: #133044

ezyang

Nice fix

… returns" Fixes #132014 In particular, this PR should: (1) start generating and registering kernels to the ADInplaceOrView dispatch key, for mutable ops that have no returns (like `split_with_sizes_copy.out`) (2) prevously, the codegen would loop over mutable return names to decide which tensors to codegen VC bumps for. Now, we just loop over mutable input argument names. [ghstack-poisoned]

ghstack-source-id: be96f2f Pull Request resolved: #133044

bdhirsh · 2024-08-09T17:15:41Z

test/cpp_extensions/open_registration_extension.cpp

-  at::TensorList tensors = {first, first};
-  at::TensorList undefined_tensors = {first, second};
-  at::TensorList steps = {step, step};
-  return at::_fused_adamw_(tensors, tensors, tensors, tensors, undefined_tensors,


so calling _fused_adamw_ here with undefined inputs feels like it was just wrong (we now error because you are passing in a mutable, undefined tensor, which autograd is not allowed to bump the VC on).

To make the test a bit more realistic I updated it to use aten::index(). which accepts undefined tensor indices (which are not mutated)

bdhirsh · 2024-08-09T17:16:27Z

@pytorchbot merge

pytorchmergebot · 2024-08-09T17:18:46Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-08-09T17:24:09Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-cpu-py3.10-gcc9-bazel-test / build-and-test (default, 1, 1, amz2023.linux.4xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

albanD

Nice!

… returns" Fixes #132014 In particular, this PR should: (1) start generating and registering kernels to the ADInplaceOrView dispatch key, for mutable ops that have no returns (like `split_with_sizes_copy.out`) (2) prevously, the codegen would loop over mutable return names to decide which tensors to codegen VC bumps for. Now, we just loop over mutable input argument names. [ghstack-poisoned]

medivh-xp · 2024-09-04T06:06:03Z

@pytorchbot merge

pytorchmergebot · 2024-09-04T06:08:16Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

… returns" Fixes #132014 In particular, this PR should: (1) start generating and registering kernels to the ADInplaceOrView dispatch key, for mutable ops that have no returns (like `split_with_sizes_copy.out`) (2) prevously, the codegen would loop over mutable return names to decide which tensors to codegen VC bumps for. Now, we just loop over mutable input argument names. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

ghstack-source-id: 1453582 Pull Request resolved: #133044

… returns" Fixes #132014 In particular, this PR should: (1) start generating and registering kernels to the ADInplaceOrView dispatch key, for mutable ops that have no returns (like `split_with_sizes_copy.out`) (2) prevously, the codegen would loop over mutable return names to decide which tensors to codegen VC bumps for. Now, we just loop over mutable input argument names. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

ghstack-source-id: d651cfa Pull Request resolved: #133044

… returns" Fixes #132014 In particular, this PR should: (1) start generating and registering kernels to the ADInplaceOrView dispatch key, for mutable ops that have no returns (like `split_with_sizes_copy.out`) (2) prevously, the codegen would loop over mutable return names to decide which tensors to codegen VC bumps for. Now, we just loop over mutable input argument names. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

ghstack-source-id: 8a20eb2 Pull Request resolved: #133044

… returns" Fixes #132014 In particular, this PR should: (1) start generating and registering kernels to the ADInplaceOrView dispatch key, for mutable ops that have no returns (like `split_with_sizes_copy.out`) (2) prevously, the codegen would loop over mutable return names to decide which tensors to codegen VC bumps for. Now, we just loop over mutable input argument names. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

ghstack-source-id: e9fa357 Pull Request resolved: #133044

github-actions · 2025-05-13T16:40:20Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

autograd codegen: bump VC properly for mutable ops with no returns

ae5d7c1

[ghstack-poisoned]

bdhirsh requested review from albanD and soulitzer as code owners August 8, 2024 22:51

bdhirsh mentioned this pull request Aug 8, 2024

track number of cpp->python exceptions thrown in torch.compile benchmark suite #131481

Closed

github-actions bot requested review from SherlockNoMad, antoniojkim, ezyang and miladm August 8, 2024 22:51

bdhirsh added a commit that referenced this pull request Aug 8, 2024

autograd codegen: bump VC properly for mutable ops with no returns

ddfffd5

ghstack-source-id: 5fc326e Pull Request resolved: #133044

ezyang approved these changes Aug 9, 2024

View reviewed changes

bdhirsh added a commit that referenced this pull request Aug 9, 2024

autograd codegen: bump VC properly for mutable ops with no returns

4fdc876

ghstack-source-id: be96f2f Pull Request resolved: #133044

bdhirsh commented Aug 9, 2024

View reviewed changes

bdhirsh added the release notes: autograd release notes category label Aug 9, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 9, 2024

pytorchmergebot added the merging label Aug 9, 2024

pytorchmergebot removed the merging label Aug 9, 2024

albanD approved these changes Aug 13, 2024

View reviewed changes

bdhirsh mentioned this pull request Aug 17, 2024

log ViewAndMutationMeta to trace_structured #133784

Closed

bdhirsh mentioned this pull request Aug 22, 2024

AOTDispatcher: limit cases when we detach() graph inputs to non-leaves #134193

Closed

pytorchmergebot added the merging label Sep 4, 2024

bdhirsh mentioned this pull request Mar 7, 2025

replace usages of upload_graph in inductor with tlparse (v2) #148720

Closed

bdhirsh added a commit that referenced this pull request Mar 7, 2025

autograd codegen: bump VC properly for mutable ops with no returns

38042bf

ghstack-source-id: 1453582 Pull Request resolved: #133044

bdhirsh added a commit that referenced this pull request Mar 8, 2025

autograd codegen: bump VC properly for mutable ops with no returns

07d0b9b

ghstack-source-id: d651cfa Pull Request resolved: #133044

bdhirsh mentioned this pull request Mar 10, 2025

partitioner: treat inputs with static indices as free to save #148922

Closed

bdhirsh added a commit that referenced this pull request Mar 10, 2025

autograd codegen: bump VC properly for mutable ops with no returns

a8dd101

ghstack-source-id: 8a20eb2 Pull Request resolved: #133044

bdhirsh added a commit that referenced this pull request Mar 12, 2025

autograd codegen: bump VC properly for mutable ops with no returns

d422b26

ghstack-source-id: e9fa357 Pull Request resolved: #133044

github-actions bot added the Stale label May 13, 2025

github-actions bot closed this Jun 12, 2025

github-actions bot deleted the gh/bdhirsh/604/head branch July 13, 2025 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

autograd codegen: bump VC properly for mutable ops with no returns #133044

autograd codegen: bump VC properly for mutable ops with no returns #133044

Uh oh!

bdhirsh commented Aug 8, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 8, 2024 •

edited

Loading

Uh oh!

ezyang left a comment

Uh oh!

bdhirsh Aug 9, 2024

Uh oh!

bdhirsh commented Aug 9, 2024

Uh oh!

pytorchmergebot commented Aug 9, 2024

Uh oh!

pytorchmergebot commented Aug 9, 2024

Uh oh!

albanD left a comment

Uh oh!

medivh-xp commented Sep 4, 2024

Uh oh!

pytorchmergebot commented Sep 4, 2024

Uh oh!

github-actions bot commented May 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

autograd codegen: bump VC properly for mutable ops with no returns #133044

autograd codegen: bump VC properly for mutable ops with no returns #133044

Uh oh!

Conversation

bdhirsh commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133044

❌ 44 New Failures

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

bdhirsh Aug 9, 2024

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented Aug 9, 2024

Uh oh!

pytorchmergebot commented Aug 9, 2024

Merge started

Uh oh!

pytorchmergebot commented Aug 9, 2024

Merge failed

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

medivh-xp commented Sep 4, 2024

Uh oh!

pytorchmergebot commented Sep 4, 2024

Merge started

Uh oh!

github-actions bot commented May 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bdhirsh commented Aug 8, 2024 •

edited

Loading

pytorch-bot bot commented Aug 8, 2024 •

edited

Loading