-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Apply saved tensor hooks #60975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply saved tensor hooks #60975
Conversation
Summary: Fixes #58512 Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 366ad39 (more details on the Dr. CI page and at hud.pytorch.org/pr/60975):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
@Varal7 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Fixes #58512 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D29466227](https://our.internmc.facebook.com/intern/diff/D29466227) [ghstack-poisoned]
|
@Varal7 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Fixes #58512 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D29466227](https://our.internmc.facebook.com/intern/diff/D29466227) [ghstack-poisoned]
|
@Varal7 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Fixes #58512 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D29466227](https://our.internmc.facebook.com/intern/diff/D29466227) [ghstack-poisoned]
|
@Varal7 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Fixes #58512 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D29466227](https://our.internmc.facebook.com/intern/diff/D29466227) [ghstack-poisoned]
Summary: Fixes #58512 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D29466227](https://our.internmc.facebook.com/intern/diff/D29466227) [ghstack-poisoned]
Summary: Fixes #58512 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D29466227](https://our.internmc.facebook.com/intern/diff/D29466227) [ghstack-poisoned]
Summary: Fixes #58512 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D29466227](https://our.internmc.facebook.com/intern/diff/D29466227) [ghstack-poisoned]
Fixes #58659. This PR builds directly on top of [#58512](#58512 (comment)). ~~Creates a context manager `with torch.autograd.graph.saved_tensors_default_hooks(pack, unpack)` that can be used on the Python side.~~ Expose a pair of functions to Python users: `torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack)` and `torch.autograd.graph.reset_saved_tensors_default_hooks()`. These functions control the hooks applied to saved tensors: all tensors *saved* in that context will be packed using the `pack` function, then unpacked accordingly when needed. Currently, this works by simply calling `register_hooks` (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a `pack` function that saves to disk whenever the tensor size is too big and returns a filename, then `unpack` simply reads the content of the file and outputs a tensor, e.g.: ```python def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Differential Revision: [D29792193](https://our.internmc.facebook.com/intern/diff/D29792193) [ghstack-poisoned]
Fixes #58659. This PR builds directly on top of [#58512](#58512 (comment)). ~~Creates a context manager `with torch.autograd.graph.saved_tensors_default_hooks(pack, unpack)` that can be used on the Python side.~~ Expose a pair of functions to Python users: `torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack)` and `torch.autograd.graph.reset_saved_tensors_default_hooks()`. These functions control the hooks applied to saved tensors: all tensors *saved* in that context will be packed using the `pack` function, then unpacked accordingly when needed. Currently, this works by simply calling `register_hooks` (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a `pack` function that saves to disk whenever the tensor size is too big and returns a filename, then `unpack` simply reads the content of the file and outputs a tensor, e.g.: ```python def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Differential Revision: [D29792193](https://our.internmc.facebook.com/intern/diff/D29792193) [ghstack-poisoned]
Fixes #58659. This PR builds directly on top of [#58512](#58512 (comment)). ~~Creates a context manager `with torch.autograd.graph.saved_tensors_default_hooks(pack, unpack)` that can be used on the Python side.~~ Expose a pair of functions to Python users: `torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack)` and `torch.autograd.graph.reset_saved_tensors_default_hooks()`. These functions control the hooks applied to saved tensors: all tensors *saved* in that context will be packed using the `pack` function, then unpacked accordingly when needed. Currently, this works by simply calling `register_hooks` (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a `pack` function that saves to disk whenever the tensor size is too big and returns a filename, then `unpack` simply reads the content of the file and outputs a tensor, e.g.: ```python def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Differential Revision: [D29792193](https://our.internmc.facebook.com/intern/diff/D29792193) [ghstack-poisoned]
Summary: Pull Request resolved: #61834 Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks(). These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed. Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.: ``` def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29792193 Pulled By: Varal7 fbshipit-source-id: 33e931230ef59faa3ec8b5d11ef7c05539bce77c
Add section to the Autograd mechanics docs to describe the recently exposed saved tensors (pytorch#52451), how to register packing / unpacking hooks (pytorch#60975) and how to use default hooks (pytorch#61834)
Summary: Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks(). These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed. Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.: ``` def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Relanding previous PR: #61834 Original PR led to timeout error in: https://www.internalfb.com/mast/job/yuguo-release_canary_offline_training-inlinecvrp_a-canary_offline_train_28a7ecfc Now passing: https://www.internalfb.com/mast/job/quach-release_canary_offline_training-inlinecvrp_a-canary_offline_train_9bb57e98 The difference with the new version is we don't need to acquire the GIL when calling `PyDefaultSavedVariableHooks::get_hooks`. [ghstack-poisoned]
Summary: Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks(). These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed. Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.: ``` def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Relanding previous PR: #61834 Original PR led to timeout error in: https://www.internalfb.com/mast/job/yuguo-release_canary_offline_training-inlinecvrp_a-canary_offline_train_28a7ecfc Now passing: https://www.internalfb.com/mast/job/quach-release_canary_offline_training-inlinecvrp_a-canary_offline_train_9bb57e98 The difference with the new version is we don't need to acquire the GIL when calling `PyDefaultSavedVariableHooks::get_hooks`. [ghstack-poisoned]
Summary: Pull Request resolved: #62563 Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks(). These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed. Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.: ``` def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Relanding previous PR: #61834 Original PR led to timeout error in: https://www.internalfb.com/mast/job/yuguo-release_canary_offline_training-inlinecvrp_a-canary_offline_train_28a7ecfc Now passing: https://www.internalfb.com/mast/job/quach-release_canary_offline_training-inlinecvrp_a-canary_offline_train_9bb57e98 The difference with the new version is we don't need to acquire the GIL when calling `PyDefaultSavedVariableHooks::get_hooks`. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D30045405 Pulled By: Varal7 fbshipit-source-id: 7f6c07af3a56fe8835d5edcc815c15ea4fb4e332
Summary: Add section to the Autograd mechanics docs to describe the recently exposed saved tensors (#52451), how to register packing / unpacking hooks (#60975) and how to use default hooks (#61834) Sister PR: #62361 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first) Pull Request resolved: #62362 Reviewed By: soulitzer Differential Revision: D30453177 Pulled By: Varal7 fbshipit-source-id: f5759977b069ff0ef36a47b08856d297691a6caa
Stack from ghstack:
Summary: Fixes #58512
Uses the hooks introduced in #60663: upon registering, the pack hook is called and the returned python object is stored. From then on, whenever we need to unpack it, we will use that python object in combination with the unpack hook.
The packing can be done with gradient tracking disabled as we will add back the correct grad_fn during unpacking.
Inplace operations done by the pack_hook on the original tensor (in the case
leaf || !output) will be caught if the Saved tensor is used by another op.Inplace operations done by unpack_hook will unfortunately not be caught. We will add a warning in the docs that follows this PR.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D29466227