Implement `torch._foreach_lerp` #87562

crcrpar · 2022-10-23T01:06:07Z

As per title.

~~Q: Do we want torch._foreach_lerp.ScalarList as well?~~
~~we might want to have ATen/native/cuda/lerp.cuh and include it in ATen/native/cuda/Lerp.cu and ATen/native/cuda/ForeachTernaryOp.cu~~

Foreach Functions Tracking Issue #58833
EMA optimizer: class-form and function-form (using new foreach_lerp) - can be used for explicit robust updates of BatchNorm stats #71683

cc @vadimkantorov @ptrblck

pytorch-bot · 2022-10-23T01:06:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87562

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

win-vs2019-cpu-py3 / build workflows failing consistently with linker crash

✅ No Failures

As of commit 9961184:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vadimkantorov · 2022-10-23T13:09:55Z

Also, unfortuately, it doesn't solve all needed foreach-needs of adam :(
e.g.:

torch._foreach_mul_(acc_deltas, rho)
torch._foreach_addcmul_(acc_deltas, deltas, deltas, value=1 - rho)

But foreach_lerp is useful for a more idiomatic EMA optimizer anyway.

A related discussion on a generalization: ax + by #79352 (comment), maybe a more generalization would be abc + def where a, b, c, d, e, f can be scalars or tensors

aten/src/ATen/native/cuda/ForeachTernaryOp.cu

test/test_foreach.py

crcrpar · 2022-11-23T08:37:50Z

This has now integrated SampleInput into ForeachOpInfo and is ready for another review

vadimkantorov · 2022-12-07T17:09:29Z

Another usecase for foreach_lerp is to implement explicit, manual running stats updates for a bunch of batchnorm modules (context in #90342 (comment)).

In this way, the fields running_mean/var of batchnorm modules are used to store current batch mean/var, and then the update of running_mean/var parameters is done by a separate optimizer / manually using foreach_lerp

crcrpar · 2023-01-08T01:04:00Z

@ngimel friendly ping

crcrpar · 2023-01-09T12:56:20Z

@pytorchbot merge

pytorchmergebot · 2023-01-09T13:00:23Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

vadimkantorov · 2023-01-09T13:03:48Z

@crcrpar Could you please then comment in #71683 what is implemented? Are the remaining things there the EMA optimizer? Btw now with this foreach_lerp, one could do elegant manual updates of BatchNorm stats params (with a manual call to foreach_lerp or to a separate EMA optimizer) and guard against NaN/Inf as one wishes

vadimkantorov · 2023-01-09T13:06:56Z

There may be another frequent idiom in optimizers:
#71683 (comment)

that would be fixed by fused op for alpha*tensor1*tensor2 + beta*tensor3*tensor4 (for symmetricitiy; but with optimizations of memory loads if tensor1 == tensor2 / tensor3 == tensor4 and if some arguments are just not set / == 1): as proposed in #79352 (comment)

pytorchmergebot · 2023-01-09T17:58:21Z

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / linux-focal-rocm5.3-py3.8 / test (default, 2, 2, linux.rocm.gpu)

Details for Dev Infra team

Raised by workflow job

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

and hopefully build as well. I have no idea why previous commits did work even without `<ATen/ops/_foreach_lerp_native.h>`. Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

which is a Python3.8 feature... https://docs.python.org/3/whatsnew/3.8.html#f-strings-support-for-self-documenting-expressions-and-debugging Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

to use SampleInput Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar · 2023-01-11T02:50:18Z

@pytorchbot merge

pytorchmergebot · 2023-01-11T02:51:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Skylion007 · 2023-01-11T15:49:54Z

aten/src/ATen/native/ForeachOpsKernels.cpp

+#define FOREACH_TERNARY_OP(OP)                                                                                           \
+std::vector<Tensor> foreach_tensor_ternary_##OP##_slow(TensorList tensors1, TensorList tensors2, TensorList tensors3) {  \
+  check_foreach_api_restrictions(tensors1, tensors2, tensors3);                                                          \
+  std::vector<Tensor> result;                                                                                            \


@crcrpar Probably should have done and result.reserve(tensors1.size()); here.

crcrpar requested review from mruberry and ngimel as code owners October 23, 2022 01:06

pytorch-bot bot added the release notes: foreach_frontend release notes category label Oct 23, 2022

pytorchbot added the open source label Oct 23, 2022

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 23, 2022

vadimkantorov reviewed Oct 23, 2022

View reviewed changes

aten/src/ATen/native/cuda/ForeachTernaryOp.cu Outdated Show resolved Hide resolved

ngimel reviewed Oct 28, 2022

View reviewed changes

aten/src/ATen/native/cuda/ForeachTernaryOp.cu Outdated Show resolved Hide resolved

test/test_foreach.py Outdated Show resolved Hide resolved

crcrpar marked this pull request as draft November 1, 2022 18:59

crcrpar force-pushed the foreach_lerp branch from 26de3de to 75fb5ab Compare November 3, 2022 22:04

crcrpar marked this pull request as ready for review November 4, 2022 08:36

crcrpar force-pushed the foreach_lerp branch from a58390e to af90bca Compare November 22, 2022 02:56

vadimkantorov mentioned this pull request Dec 7, 2022

BatchNorm2d doesn't handle inf input in AMP training #90342

Closed

crcrpar force-pushed the foreach_lerp branch 2 times, most recently from 36d6cd7 to 2b01c3c Compare January 7, 2023 01:45

ngimel approved these changes Jan 8, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 9, 2023

crcrpar force-pushed the foreach_lerp branch 2 times, most recently from 702a886 to 55c0ad4 Compare January 10, 2023 05:27

crcrpar added 2 commits January 10, 2023 14:47

Implement foreach_lerp

ec0e7eb

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

add test for foreach_lerp

eb88dc1

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar added 8 commits January 10, 2023 14:47

fix import torch

5f35709

and hopefully build as well. I have no idea why previous commits did work even without `<ATen/ops/_foreach_lerp_native.h>`. Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Use SampleInput for test_foreach.py

654ec39

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

tweak n_expected_cudaLaunchKernels

2e3bd75

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

remove self-documenting expressions (f-string)

979df3f

which is a Python3.8 feature... https://docs.python.org/3/whatsnew/3.8.html#f-strings-support-for-self-documenting-expressions-and-debugging Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

fix typo

fd3c047

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

TEST_WITH_SLOW is used

b06ef23

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

clean up

df07498

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

update foreach_clamp_(min|max) tests

9961184

to use SampleInput Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar force-pushed the foreach_lerp branch from 55c0ad4 to 9961184 Compare January 10, 2023 23:00

pytorchmergebot added the Merged label Jan 11, 2023

pytorchmergebot closed this in 554a796 Jan 11, 2023

crcrpar deleted the foreach_lerp branch January 11, 2023 03:11

Skylion007 reviewed Jan 11, 2023

View reviewed changes

vadimkantorov mentioned this pull request Jan 11, 2023

EMA optimizer: class-form and function-form (using new foreach_lerp) - can be used for explicit robust updates of BatchNorm stats #71683

Open

milesial mentioned this pull request Jan 12, 2023

Fix _foreach_norm on some tensor sizes #91844

Closed

Implement torch._foreach_lerp #87562

Implement torch._foreach_lerp #87562

Uh oh!

Conversation

crcrpar commented Oct 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87562

❗ 1 Active SEVs

✅ No Failures

Uh oh!

vadimkantorov commented Oct 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crcrpar commented Nov 23, 2022

Uh oh!

vadimkantorov commented Dec 7, 2022

Uh oh!

crcrpar commented Jan 8, 2023

Uh oh!

crcrpar commented Jan 9, 2023

Uh oh!

pytorchmergebot commented Jan 9, 2023

Merge started

Uh oh!

vadimkantorov commented Jan 9, 2023

Uh oh!

vadimkantorov commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorchmergebot commented Jan 9, 2023

Merge failed

Uh oh!

crcrpar commented Jan 11, 2023

Uh oh!

pytorchmergebot commented Jan 11, 2023

Merge started

Uh oh!

Skylion007 Jan 11, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Implement `torch._foreach_lerp` #87562

Implement `torch._foreach_lerp` #87562

crcrpar commented Oct 23, 2022 •

edited

Loading

pytorch-bot bot commented Oct 23, 2022 •

edited

Loading

vadimkantorov commented Oct 23, 2022 •

edited

Loading

vadimkantorov commented Jan 9, 2023 •

edited

Loading