[stateless] add weight tying support #90477

samdow · 2022-12-08T16:27:51Z

Stack from ghstack (oldest at bottom):

-> [stateless] add weight tying support #90477

cc @zou3519 @Chillee @soumith

[ghstack-poisoned]

pytorch-bot · 2022-12-08T16:27:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90477

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

win-vs2019-cpu-py3 / build workflows failing consistently with linker crash

❌ 2 Failures

As of commit d07d26f:

NEW FAILURES - The following jobs have failed:

macos-12-py3-arm64-mps / Run MPS tests

FLAKY - The following jobs failed but were likely due to flakiness present on master:

linux-focal-rocm5.3-py3.8 / test (default, 2, 2, linux.rocm.gpu)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

samdow · 2022-12-08T16:49:11Z

This changed the algorithm from make_functional and from the original PR #87079, so some new numbers from that

In running a script like this on resnet-18 (no tied parameters), this went from 58% slower than vanilla (not using functional_call at all) to 63% slower. This is a 3.5% slowdown from the functional_call before this PR

Raw numbers in case anyone wants to see:

% of parameters passed	CPU Time (us)	GPU Time (us)
Vanilla model	9032	48657
Pre-PR with 100% of params changed	14334	51680
This PR with 100% of params changed	14809	52187

zou3519 · 2022-12-08T19:12:55Z

If I'm understanding the numbers, they are measuring some fixed number of iterations of resnet18 using the functional_call API. Shouldn't resnet be faster on GPU than CPU?

[ghstack-poisoned]

samdow · 2022-12-08T19:23:21Z

If I'm understanding the numbers, they are measuring some fixed number of iterations of resnet18 using the functional_call API

Yep! They're running it 10 times and then this is the average time as reported

Shouldn't resnet be faster on GPU than CPU?

Yeah I didn't explain these tables well. From the docs, my understanding is that this reports the amount of time used by the CPU and GPU separately. In this case, what we're worried about is the CPU time since the code for the stateless call happens on CPU

ghstack-source-id: f3f275a Pull Request resolved: #90477

zou3519

Code looks correct, some questions

test/test_stateless.py

zou3519 · 2022-12-09T18:03:31Z

torch/nn/utils/stateless.py

    parameters_and_buffers: Dict[str, Tensor],
    args: Union[Any, Tuple],
    kwargs: Dict[str, Any] = None,
+    tie_weights: bool = True,


This change is technically BC-breaking (does it matter?). What's our deprecation plan for nn.utils.stateless.functional_call?
Easiest thing to do seems like:

nn.utils.stateless.functional_call should retain the same behavior as before (tie_weights=False?)

we deprecate nn.utils.stateless.functional_call in the next version of PyTorch (so, now on master)

we introduce a new torch.func.functional_call (probably needs a better name) to replace it in the next version of PyTorch that has our preferred default (tie_weights=True)

This change is technically BC-breaking (does it matter?). What's our deprecation plan for nn.utils.stateless.functional_call?

Yep, this is BC-breaking on a beta feature. This is from earlier talks with @albanD where we figured the default behavior should be what most people expect (which, from the make_functional requests, seems to be to have the tied weights get changed together).

Since we are moving it to torch.func anyway, I'm fine to change the default back to match old behavior for now and break it when we do the move unless @albanD has other thoughts?

Both can work really.
But it might be simple to really have the torch.func version be an alias to the old version.
And yes from earlier discussions, I think this BC-breaking is ok.

torch/nn/utils/stateless.py

[ghstack-poisoned]

ghstack-source-id: 87031f7 Pull Request resolved: #90477

torch/nn/utils/stateless.py

zou3519

Code LGTM, minor comments. Algorithm feels a bit weird but I can't come up with something simpler

test/test_stateless.py

zou3519 · 2022-12-28T20:08:04Z

torch/nn/utils/stateless.py

    parameters_and_buffers: Dict[str, Tensor],
    args: Union[Any, Tuple],
    kwargs: Dict[str, Any] = None,
+    tie_weights: bool = False,


nit: "weights" can refer to only parameters, "tie_weights" effectively ties both parameters and buffers. Is that a problem? If we think it's a problem, we can rename it tie_weights_and_buffers. I kind of like the shorter name (and we can document that it applies to both parameters and buffers), but open to suggestions.

Docstring for nn.utils.stateless.functional_call needs to be updated with new flag and details on what it does (unless the plan was to just update torch.func.functional_call)

I'm somewhat attached to just weights since the original paper calls it weight tying (granted there they are only tying the weight/parameters). However I hear you that it's ambiguous so the docstring comment explicitly calls out that it ties both parameters and buffers

torch/nn/utils/stateless.py

zou3519 · 2022-12-28T20:23:35Z

torch/nn/utils/stateless.py

-    def _swap_parameters(module, tensor_name: str, full_path: str, tensor: Tensor) -> None:
+
+def _create_swap_params(params_and_buffers):
+    def _swap_parameters(module, tensor_name: str, full_path: str, tensor: Optional[Tensor]) -> None:


What is tensor being used for?

It's unused but previously it was the original weight. Since it was harder to get for tied weights, I passed None and had to update the signature to match that. I can add a patch on top of this to remove that parameter since it appears unused

torch/nn/utils/stateless.py

[ghstack-poisoned]

ghstack-source-id: df4f19d Pull Request resolved: #90477

[ghstack-poisoned]

ghstack-source-id: c8a6c49 Pull Request resolved: #90477

torch/_functorch/functional_call.py

albanD

Thanks a lot!

torch/nn/utils/stateless.py

albanD · 2023-01-10T20:32:34Z

torch/nn/utils/stateless.py

+    def add_to_name_map(n: str, t: torch.Tensor):
+        # if the tensor hasn't been seen before, add it to the map
+        if t not in weight_to_name_and_tied_names:
+            weight_to_name_and_tied_names[t] = (n, set()) if n in names else (None, {n})


for n in names, I am not sure what exact object the dict_keys object is. What is the complexity of this lookup?

O(1) according to the internet: https://stackoverflow.com/questions/17539367/python-dictionary-keys-in-complexity

cc zou3519 Chillee soumith [ghstack-poisoned]

ghstack-source-id: 988c614 Pull Request resolved: #90477

samdow · 2023-01-11T15:16:51Z

@pytorchbot merge -f "failures from flaky test and unrelated mps test"

pytorchmergebot · 2023-01-11T15:19:01Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[stateless] add weight tying support

2d09ab5

[ghstack-poisoned]

samdow requested review from albanD and jbschlosser as code owners December 8, 2022 16:27

samdow mentioned this pull request Dec 8, 2022

[stateless] fix functional call docs #90476

Closed

samdow added release notes: nn release notes category topic: deprecation topic category labels Dec 8, 2022

samdow requested a review from zou3519 December 8, 2022 16:32

Update on "[stateless] add weight tying support"

de5a347

[ghstack-poisoned]

samdow pushed a commit that referenced this pull request Dec 8, 2022

[stateless] add weight tying support

cc0403f

ghstack-source-id: f3f275a Pull Request resolved: #90477

zou3519 reviewed Dec 9, 2022

View reviewed changes

samdow added 2 commits December 14, 2022 18:53

Update on "[stateless] add weight tying support"

f273977

[ghstack-poisoned]

Update on "[stateless] add weight tying support"

9053d12

[ghstack-poisoned]

samdow pushed a commit that referenced this pull request Dec 14, 2022

[stateless] add weight tying support

a190722

ghstack-source-id: 87031f7 Pull Request resolved: #90477

samdow mentioned this pull request Dec 27, 2022

[functorch] add functorch functional_call, update tests to test this #89213

Closed

zou3519 self-requested a review December 28, 2022 14:26

zou3519 reviewed Dec 28, 2022

View reviewed changes

torch/nn/utils/stateless.py Show resolved Hide resolved

zou3519 reviewed Dec 28, 2022

View reviewed changes

Update on "[stateless] add weight tying support"

5358159

[ghstack-poisoned]

samdow pushed a commit that referenced this pull request Jan 9, 2023

[stateless] add weight tying support

c98124e

ghstack-source-id: df4f19d Pull Request resolved: #90477

Update on "[stateless] add weight tying support"

b0500be

[ghstack-poisoned]

samdow pushed a commit that referenced this pull request Jan 10, 2023

[stateless] add weight tying support

1be4ca1

ghstack-source-id: c8a6c49 Pull Request resolved: #90477

zou3519 added the topic: bc breaking topic category label Jan 10, 2023

zou3519 approved these changes Jan 10, 2023

View reviewed changes

torch/_functorch/functional_call.py Show resolved Hide resolved

torch/_functorch/functional_call.py Outdated Show resolved Hide resolved

samdow added the module: functorch Pertaining to torch.func or pytorch/functorch label Jan 10, 2023

albanD reviewed Jan 10, 2023

View reviewed changes

Update on "[stateless] add weight tying support"

d07d26f

cc zou3519 Chillee soumith [ghstack-poisoned]

samdow pushed a commit that referenced this pull request Jan 10, 2023

[stateless] add weight tying support

167eec1

ghstack-source-id: 988c614 Pull Request resolved: #90477

samdow added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 10, 2023

pytorchmergebot added the Merged label Jan 11, 2023

pytorchmergebot closed this in 8b3c4bc Jan 11, 2023

facebook-github-bot deleted the gh/samdow/50/head branch June 8, 2023 18:43

[stateless] add weight tying support #90477

[stateless] add weight tying support #90477

Uh oh!

Conversation

samdow commented Dec 8, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90477

❗ 1 Active SEVs

❌ 2 Failures

Uh oh!

samdow commented Dec 8, 2022

Uh oh!

zou3519 commented Dec 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samdow commented Dec 8, 2022

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samdow Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samdow commented Jan 11, 2023

Uh oh!

pytorchmergebot commented Jan 11, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

samdow commented Dec 8, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Dec 8, 2022 •

edited

Loading

zou3519 commented Dec 8, 2022 •

edited

Loading

samdow Jan 9, 2023 •

edited

Loading