Skip to content

Conversation

@Skylion007
Copy link
Collaborator

@Skylion007 Skylion007 commented Jan 9, 2025

Clone() copies the gradients too, but we immediately detach them. Detach returns a view of the tensor without it's gradients, and the copies only that subset. Related to #144270

@Skylion007 Skylion007 requested review from albanD and ezyang January 9, 2025 14:48
@Skylion007 Skylion007 requested a review from janeyx99 as a code owner January 9, 2025 14:48
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144468

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 8 New Failures

As of commit 84f1442 with merge base 8f54e56 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Skylion007 Skylion007 requested a review from malfet January 9, 2025 14:56
@Skylion007 Skylion007 added better-engineering Relatively self-contained tasks for better engineering contributors release notes: performance_as_product labels Jan 9, 2025
Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, though practically speaking this is likely not changing much perf in most use cases as grad normally does not require grad.

@janeyx99
Copy link
Contributor

janeyx99 commented Jan 9, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 9, 2025
@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 9, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable)

Details for Dev Infra team Raised by workflow job

@Skylion007
Copy link
Collaborator Author

@pytorchbot merge -r

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #144468, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@Skylion007
Copy link
Collaborator Author

@janeyx99 Seems like a false positive, but I would like to get confirmation

@janeyx99
Copy link
Contributor

janeyx99 commented Jan 9, 2025

Ah I think you do have to modify the test now that the version counts are different.

@albanD albanD removed their request for review January 9, 2025 20:01
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Mar 10, 2025
@github-actions github-actions bot closed this Apr 9, 2025
@Skylion007 Skylion007 reopened this May 4, 2025
@Skylion007 Skylion007 requested a review from albanD as a code owner May 4, 2025 16:09
@Skylion007
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased skylion007/fix-sgd-detach-2025-01-09 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout skylion007/fix-sgd-detach-2025-01-09 && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the skylion007/fix-sgd-detach-2025-01-09 branch from 1b78678 to 84f1442 Compare May 4, 2025 16:11
Copy link
Collaborator

@cyyever cyyever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there still undiscovered clones like these?


if buf is None:
buf = torch.clone(grad).detach()
buf = torch.clone(grad.detach())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use grad.detach().clone ? Since that is a common pattern in the code base.

buf = device_momentum_buffer_list[i] = momentum_buffer_list[
indices[i]
] = torch.clone(device_grads[i]).detach()
] = torch.clone(device_grads[i].detach())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the above.

@albanD
Copy link
Collaborator

albanD commented May 5, 2025

Clone() copies the gradients too

That is NOT true.
torch.clone() and Tensor.clone() do NOT clone the gradients.
Maybe you're thinking of deepcopy(), which indeed does?

@janeyx99 janeyx99 changed the title [BE]: Improve SGD efficiency by cloning less data [BE]: Follow detach().clone() pattern May 5, 2025
Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As clone does not copy gradients, can you update the PR description? We can still follow @cyyever's suggestions here to land cleaner code for BE, even though it will only be marginally better.

@janeyx99 janeyx99 changed the title [BE]: Follow detach().clone() pattern [BE]: Follow detach().clone() pattern for SGD May 5, 2025
@github-actions github-actions bot closed this Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

better-engineering Relatively self-contained tasks for better engineering contributors ciflow/trunk Trigger trunk jobs on your pull request open source release notes: optim Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants