Skip to content

Conversation

@crcrpar
Copy link
Collaborator

@crcrpar crcrpar commented Oct 23, 2022

As per title.

  • Q: Do we want torch._foreach_lerp.ScalarList as well?
  • we might want to have ATen/native/cuda/lerp.cuh and include it in ATen/native/cuda/Lerp.cu and ATen/native/cuda/ForeachTernaryOp.cu

Related:

cc @vadimkantorov @ptrblck

@pytorch-bot pytorch-bot bot added the release notes: foreach_frontend release notes category label Oct 23, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 23, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87562

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 9961184:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Oct 23, 2022

Also, unfortuately, it doesn't solve all needed foreach-needs of adam :(
e.g.:

torch._foreach_mul_(acc_deltas, rho)
torch._foreach_addcmul_(acc_deltas, deltas, deltas, value=1 - rho)

But foreach_lerp is useful for a more idiomatic EMA optimizer anyway.

A related discussion on a generalization: ax + by #79352 (comment), maybe a more generalization would be abc + def where a, b, c, d, e, f can be scalars or tensors

@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 23, 2022
@crcrpar crcrpar marked this pull request as draft November 1, 2022 18:59
@crcrpar crcrpar marked this pull request as ready for review November 4, 2022 08:36
@crcrpar
Copy link
Collaborator Author

crcrpar commented Nov 23, 2022

This has now integrated SampleInput into ForeachOpInfo and is ready for another review

@vadimkantorov
Copy link
Contributor

Another usecase for foreach_lerp is to implement explicit, manual running stats updates for a bunch of batchnorm modules (context in #90342 (comment)).

In this way, the fields running_mean/var of batchnorm modules are used to store current batch mean/var, and then the update of running_mean/var parameters is done by a separate optimizer / manually using foreach_lerp

@crcrpar
Copy link
Collaborator Author

crcrpar commented Jan 8, 2023

@ngimel friendly ping

@crcrpar
Copy link
Collaborator Author

crcrpar commented Jan 9, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 9, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@vadimkantorov
Copy link
Contributor

@crcrpar Could you please then comment in #71683 what is implemented? Are the remaining things there the EMA optimizer? Btw now with this foreach_lerp, one could do elegant manual updates of BatchNorm stats params (with a manual call to foreach_lerp or to a separate EMA optimizer) and guard against NaN/Inf as one wishes

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Jan 9, 2023

There may be another frequent idiom in optimizers:
#71683 (comment)

that would be fixed by fused op for alpha*tensor1*tensor2 + beta*tensor3*tensor4 (for symmetricitiy; but with optimizations of memory loads if tensor1 == tensor2 / tensor3 == tensor4 and if some arguments are just not set / == 1): as proposed in #79352 (comment)

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / linux-focal-rocm5.3-py3.8 / test (default, 2, 2, linux.rocm.gpu)

Details for Dev Infra team Raised by workflow job

@crcrpar crcrpar force-pushed the foreach_lerp branch 2 times, most recently from 702a886 to 55c0ad4 Compare January 10, 2023 05:27
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
and hopefully build as well.
I have no idea why previous commits did work even without
`<ATen/ops/_foreach_lerp_native.h>`.

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
to use SampleInput

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
@crcrpar
Copy link
Collaborator Author

crcrpar commented Jan 11, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@crcrpar crcrpar deleted the foreach_lerp branch January 11, 2023 03:11
#define FOREACH_TERNARY_OP(OP) \
std::vector<Tensor> foreach_tensor_ternary_##OP##_slow(TensorList tensors1, TensorList tensors2, TensorList tensors3) { \
check_foreach_api_restrictions(tensors1, tensors2, tensors3); \
std::vector<Tensor> result; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crcrpar Probably should have done and result.reserve(tensors1.size()); here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: foreach_frontend release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants