-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Invert ownership between PyFunction and THPFunction. #22983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #16532 and #14960. This patch is a massive hack. The way I constructed it was I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so all existing code works. Essentially, this patch assumes that PyFunction stays live as long as you have a THPFunction: intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live. But as you can see from the presently skipped tests (specifically, test_hook_none), this is not always true. But it seems to be true for the code we care about, and that's enough for me! Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
|
Subsumes #16888 |
|
There's some subsequent cleanup after this inversion that is possible, but I want to make sure this works first. |
| { | ||
| for (const auto& hook : self->cdata.pre_hooks()) { | ||
| auto cdata = self->cdata.lock(); | ||
| TORCH_INTERNAL_ASSERT(cdata); // could do this optionally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this guaranteed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not :) Here's what I have in my local copy:
// Traverse and clear are required for supporting Python's GC cycle handling.
static int THPFunction_traverse(THPFunction *self, visitproc visit, void *arg)
{
- for (const auto& hook : self->cdata.pre_hooks()) {
- if (auto pyhook = dynamic_cast<PyFunctionPreHook*>(hook.get())) {
- Py_VISIT(pyhook->dict);
+ auto cdata = self->cdata.lock();
+ // cdata could be null if someone constructed a legacy function but haven't
+ // actually called backward() on it yet.
+ if (cdata) {
+ for (const auto& hook : cdata->pre_hooks()) {
+ if (auto pyhook = dynamic_cast<PyFunctionPreHook*>(hook.get())) {
+ Py_VISIT(pyhook->dict);
+ }
}
- }
- for (const auto& hook : self->cdata.post_hooks()) {
- if (auto pyhook = dynamic_cast<PyFunctionPostHook*>(hook.get())) {
- Py_VISIT(pyhook->dict);
+ for (const auto& hook : cdata->post_hooks()) {
+ if (auto pyhook = dynamic_cast<PyFunctionPostHook*>(hook.get())) {
+ Py_VISIT(pyhook->dict);
+ }
}
}
Waiting for CI to finish on the current iteration before I push though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to remove the other TORCH_INTERNAL_ASSERT(cdata) calls? I don't think any of them are safe.
| self->cdata.set_next_edges(std::move(input_info.next_edges)); | ||
| Py_INCREF(self); | ||
| auto cdata = std::shared_ptr<PyFunction>(new PyFunction(THPObjectPtr((PyObject*)self)), deleteFunction); | ||
| self->cdata = cdata; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that this method is called multiple times on the same THPFunction, so that in some invocations self->cdata is already non-NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in the legacy case it might be non-NULL. I actually have no idea what the "right" thing to do here is, because legacy autograd function behavior is nuts (all the better for @yf225 to delete them). Check this out:
macbook-pro-116:~ ezyang$ cat abb.py
import torch
from torch.autograd import Function
class Id(Function):
def forward(self, x):
self.save_for_backward(x)
return x
def backward(self, grad_x):
x = self.saved_tensors
print(x)
return x
f = Id()
x1 = torch.zeros(1, requires_grad=True)
x2 = torch.ones(1, requires_grad=True)
print(f)
y = f(x1)
print(y.grad_fn)
z = f(x2)
print(z.grad_fn)
y.backward()
macbook-pro-116:~ ezyang$ python abb.py
<__main__.Id object at 0x116d58850>
<__main__.Id object at 0x116d58850>
<__main__.Id object at 0x116d58850>
(tensor([1.], requires_grad=True),)
The second call to f clobbers the saved tensors of the first call (in y), because f is literally the same grad_fn in both cases, even though this is totally wrong. I think the API is misused here, for a legacy Function, you should only call __apply__ on it once.
On reflection, maybe the right thing is to not allocate a new PyFunction if there already is one. I can make this adjustment, but really this program is stupid and we shouldn't support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified legacy to warn when this happens.
Stack from [ghstack](https://github.com/ezyang/ghstack): * **#22983 Invert ownership between PyFunction and THPFunction.** Fixes #16532 and #14960. This patch is a massive hack. The way I constructed it was I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so all existing code works. Essentially, this patch assumes that PyFunction stays live as long as you have a THPFunction: intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live. But as you can see from the presently skipped tests (specifically, test_hook_none), this is not always true. But it seems to be true for the code we care about, and that's enough for me! Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. - Generalizing the issue above: this is BC-breaking for legacy autograd Functions in the following way: previously, the PyFunction for a THPFunction was generated immediately on construction of the Function `f` (which would eventually serve as the `grad_fn` of an object when you finally call `f(x)`.) This means that operations like `f.register_hook()` which previously worked prior to `f(x)` no longer work; you have to wait until you actually call the function to do registration. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Fixes #16532 and #14960. This patch is a massive hack. The way I constructed it was I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so all existing code works. Essentially, this patch assumes that PyFunction stays live as long as you have a THPFunction: intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live. But as you can see from the presently skipped tests (specifically, test_hook_none), this is not always true. But it seems to be true for the code we care about, and that's enough for me! Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: b00842b Pull Request resolved: #22983
|
@ezyang I don't see any additional concerns with this PR (except the one @colesbury mentioned above), but I am not confident to give it a stamp either. Shall we bring more people to take a look? |
test/test_autograd.py
Outdated
| self.assertEqual(counter[0], 1, 'bw_hook not called') | ||
| self.assertEqual(x.grad.data, torch.ones(5, 5) * 2) | ||
|
|
||
| @unittest.skip("Currently broken, will be fixed in https://github.com/pytorch/pytorch/pull/22925") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#22925 is merged (yay!), feel free to rebase :)
| with warnings.catch_warnings(record=True) as w: | ||
| z = f(x2) | ||
| self.assertIn('extending-torch-autograd', str(w[0].message)) | ||
| y.backward() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we actually call z.backward() here, since we are actually checking whether x2.grad is populated correctly?
Also, should we add an assert showing that the clobbering happens to x1.grad when y.backward() is called after f is called twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y.backward() results in the clobber behavior. I'm not really interested in the functional correctness of clobbering (the behavior is nuts); I'm just mostly interested that we don't segfault in this situation.
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good and I think it makes sense (NB it would be good to link to the explanation you wrote in another PR, because the description here is not very self-contained).
I am a bit worried about what would happen if someone was to hold on to the Python instance for too long (i.e. longer than the lifetime of the PyFunction from its use), and then try to access the attributes or reuse this. I think that would trigger your internal asserts, meaning they're not really that internal.
| // The actual PyFunction (in the autograd graph) that this data was | ||
| // saved for. This field may be NULL (because a user can construct | ||
| // a THPFunction directly from Python), but when this field is non-NULL, | ||
| // it is guaranteed that cdata.lock()->obj == this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we enforce that Python Function objects always have their corresponding PyFunctions by allocating one whenever you instantiate an object? That way we could avoid having a weak pointer, and could hold PyFunction* will the invariant that it would always be non-null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also seem to be using the assumption that .lock() succeeds quite liberally in this patch, so this comment is highly confusing for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we enforce that Python Function objects always have their corresponding PyFunctions by allocating one whenever you instantiate an object?
There are a few layers to this question, so I want to address them separately.
In the non-legacy case, why can't we allocate a PyFunction in THPFunction_new so that THPFunction is always allocated with an associated PyFunction? The proximal cause is dumb: it's because THPFunction only holds a weak pointer to PyFunction, so if you allocate a PyFunction inside THPFunction_new, the only pointer that would be keeping it live is a weak pointer, and it would immediately go dead once THPFunction_new returns.
"But wait, Edward!" you might say, "Doesn't THPFunction_apply have this problem too?" No! We keep an owning reference to PyFunction for the entire body of this function, and eventually we will store an OWNING reference to PyFunction in the grad_fn of the Variables we return (in process_outputs); that owning reference will keep PyFunction live when we return. But there's no way to setup this owning reference in the constructor THPFunction_new, so we have to set it up somewhere else. It's a bit finely balanced, but in the non-legacy case, as long as you don't go constructing copies of THPFunction outside of apply (it is possible to do this, but not in the public API), we maintain the invariant that every THPFunction has a corresponding PyFunction.
In the legacy case, why can't we guarantee that there is an associated PyFunction? The first answer is, the legacy case is dumb and we shouldn't support it. But digging deeper, I think the root problem is that THPFunction (the ctx object) is what we expose to the world as grad_fn, when actually we should have exposed some sort of Python binding of PyFunction. So there is a confusion: we treat THPFunction as a Python binding as PyFunction, but we ALSO treat it as the ctx object we pass in when doing autograd. These two use cases are incompatible, but right now they're smooshed together in a single object.
The suggestion you've made here is treating THPFunction as just a Python binding for PyFunction. Which is reasonable, but a much more involved change.
|
|
||
| Py_INCREF((PyObject*)self); | ||
| return std::shared_ptr<PyFunction>(&self->cdata, Decref()); | ||
| return self->cdata.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we no longer need to do the incref/decref here? IIRC debug builds of Python report refcounting errors if we don't keep the C++/Python refcounts approximately in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I don't understand why it was previously needed. I guess I can build a debug build of Python and see if I still get refcounting errors. If I don't, would that be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a debug build of PyTorch with this patch and it has not reported any refcounting errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a debug build of PyTorch, a debug build of CPython. I once made it such that we always use a single shared_ptr to manage PyFunction, which has started triggering Python's GC asserts and the refcounting scheme has been updated by @colesbury accordingly. We should check that this patch does not suffer from the same issues as mine.
Also, there used to be this warning that you're not supposed to take weak pointers of PyFunctions because they can be managed by multiple shared_ptr blocks. That's no longer the case after this patch, right?
| auto cdata = self->cdata.lock(); | ||
| // cdata could be null if someone constructed a legacy function but haven't | ||
| // actually called backward() on it yet. | ||
| if (cdata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put the assignment into the if condition
| auto f = (THPFunction*) obj; | ||
| auto name = std::string(Py_TYPE(f)->tp_name); | ||
| THPObjectPtr _legacy(PyObject_GetAttrString(obj, "_is_legacy")); | ||
| THPObjectPtr _legacy(PyObject_GetAttrString((PyObject*)obj, "_is_legacy")); // CONST?! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a nicer cast?
|
|
||
| @unittest.skipIf(not TEST_MULTIGPU, "multi-GPU not supported") | ||
| @skipIfRocm | ||
| def test_data_parallel_function_deletion(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this related to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests the original bug that spawned this line of inquiry :)
| "from the second invocation will clobber the saved tensors from the " | ||
| "first invocation. Please consider rewriting your autograd function " | ||
| "in the modern style; for information on the new format, please see: " | ||
| "https://pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we use a legacy function once, free the backward, and try to apply it again? That wouldn't trigger this warning, so is that safe, or is it just another bug that is not covered by this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Legacy is nuts so I don't really know for certain, but I think that it is technically safe. Because the backward is freed, there's nothing to clobber anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could tighten up the check by marking the legacy function as "consumed" once you apply it once (and error if you apply it again), but this is all best effort warning because we opted not to remove legacy functions this release cycle...
|
Also, I would be ok if we simply deleted the legacy paths today. They have been deprecated since at least 0.3 and it's just a pain to maintain them. Also, IIRC JIT will blow up if you try to mix them. |
Yeah, @colesbury complained about this in #22998 so I'm probably going to have to move around data little so it is guaranteed to stay live. None of the current test cases exercise the codepath but I certainly believe it's broken. |
Yeah, me too XD. But IIUC they are somewhat widely used so we really should have an explicit deprecation notice. |
Stack from [ghstack](https://github.com/ezyang/ghstack): * **#22983 Invert ownership between PyFunction and THPFunction.** Fixes #16532 and #14960. Let us introduce our dramatis personae: * THPFunction - a PyObject that represents a ctx object that is passed to our custom Python autograd functions. It contains a PyFunction * PyFunction - a subclass of Function which says how to compute the actual backward function. Note that this is a closure in the sense that it is associated with a ctx object which actually contains the saved tensors. It contains a non-owning pointer to THPFunction OK, do you see something wrong with this situation? The ownership here is totally backwards! It is PyFunction that should own the THPFunction, because a PyFunction is a closure that should own its context. To put it in other words, it is OK if a PyFunction gets freed before THPFunction, but not vice versa. This sounds nice, but making this happen in practice is a bit trickier, because there are cases where we do rely on ownership the wrong way. This patch simply "assumes" that ownership goes the correct direction in practice. The way I constructed it was I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so all existing code works. Essentially, this patch assumes that PyFunction stays live as long as you have a THPFunction: intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live. But as you can see from the presently skipped tests (specifically, test_hook_none), this is not always true. But it seems to be true for the code we care about, and that's enough for me! Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. - Generalizing the issue above: this is BC-breaking for legacy autograd Functions in the following way: previously, the PyFunction for a THPFunction was generated immediately on construction of the Function `f` (which would eventually serve as the `grad_fn` of an object when you finally call `f(x)`.) This means that operations like `f.register_hook()` which previously worked prior to `f(x)` no longer work; you have to wait until you actually call the function to do registration. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Stack from [ghstack](https://github.com/ezyang/ghstack): * **#22983 Invert ownership between PyFunction and THPFunction.** Fixes #16532 and #14960. Whence this fix? Let's introduce the dramatis personae: * THPFunction - a PyObject that represents a ctx object that is passed to our custom Python autograd functions. It contains a PyFunction * PyFunction - a subclass of Function which says how to compute the actual backward function. Note that this is a closure in the sense that it is associated with a ctx object which actually contains the saved tensors. It contains a non-owning pointer to THPFunction OK, do you see something wrong with this situation? The ownership here is totally backwards! It is PyFunction that should own the THPFunction, because a PyFunction is a closure that should own its context. To put it in other words, it is OK if a PyFunction gets freed before THPFunction, but not vice versa. This sounds nice, but making this happen in practice is a bit trickier, because sometimes we use THPFunction to represent the PyFunction. In particular, we represent a `grad_fn` object in the Python binding level as the THPFunction (I can only imagine we did this because we wanted the object identity of `grad_fn` to line up with the class the user provided). I did not have time to untangle this situation, so for this patch, in the cases where we're improperly treating a THPFunction as a PyFunction, we simply assume that the PyFunction is still live (it usually is), and yell at the user if it's not. I constructed this patch in a few steps: 1. First, I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so that I could have an escape hatch when the existing code improperly treated THPFunction as an owning reference to PyFunction. In some cases your code might break because of this, but most of the time the PyFunction is live and everything works out (intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live). 2. There are some cases where PyFunction is *provably* live at some point in time. In that case, don't bother extracting a strong reference from the weak pointer, just pass the strong pointer along directly! 3. For all the other cases where we can't prove it, write a long error message to help the user out when they run into that situation. Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. - Generalizing the issue above: this is BC-breaking for legacy autograd Functions in the following way: previously, the PyFunction for a THPFunction was generated immediately on construction of the Function `f` (which would eventually serve as the `grad_fn` of an object when you finally call `f(x)`.) This means that operations like `f.register_hook()` which previously worked prior to `f(x)` no longer work; you have to wait until you actually call the function to do registration. I also removed dead Function::get_shared_ptr and Decref struct; these were previously needed for the crazy split refcounting scheme we had previously, but now are dead code. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Stack from [ghstack](https://github.com/ezyang/ghstack): * **#22983 Invert ownership between PyFunction and THPFunction.** Fixes #16532 and #14960. Whence this fix? Let's introduce the dramatis personae: * THPFunction - a PyObject that represents a ctx object that is passed to our custom Python autograd functions. It contains a PyFunction * PyFunction - a subclass of Function which says how to compute the actual backward function. Note that this is a closure in the sense that it is associated with a ctx object which actually contains the saved tensors. It contains a non-owning pointer to THPFunction OK, do you see something wrong with this situation? The ownership here is totally backwards! It is PyFunction that should own the THPFunction, because a PyFunction is a closure that should own its context. To put it in other words, it is OK if a PyFunction gets freed before THPFunction, but not vice versa. This sounds nice, but making this happen in practice is a bit trickier, because sometimes we use THPFunction to represent the PyFunction. In particular, we represent a `grad_fn` object in the Python binding level as the THPFunction (I can only imagine we did this because we wanted the object identity of `grad_fn` to line up with the class the user provided). I did not have time to untangle this situation, so for this patch, in the cases where we're improperly treating a THPFunction as a PyFunction, we simply assume that the PyFunction is still live (it usually is), and yell at the user if it's not. I constructed this patch in a few steps: 1. First, I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so that I could have an escape hatch when the existing code improperly treated THPFunction as an owning reference to PyFunction. In some cases your code might break because of this, but most of the time the PyFunction is live and everything works out (intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live). 2. There are some cases where PyFunction is *provably* live at some point in time. In that case, don't bother extracting a strong reference from the weak pointer, just pass the strong pointer along directly! 3. For all the other cases where we can't prove it, write a long error message to help the user out when they run into that situation. Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. - Generalizing the issue above: this is BC-breaking for legacy autograd Functions in the following way: previously, the PyFunction for a THPFunction was generated immediately on construction of the Function `f` (which would eventually serve as the `grad_fn` of an object when you finally call `f(x)`.) This means that operations like `f.register_hook()` which previously worked prior to `f(x)` no longer work; you have to wait until you actually call the function to do registration. I also removed dead Function::get_shared_ptr and Decref struct; these were previously needed for the crazy split refcounting scheme we had previously, but now are dead code. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Stack from [ghstack](https://github.com/ezyang/ghstack): * **#22983 Invert ownership between PyFunction and THPFunction.** Fixes #16532 and #14960. Whence this fix? Let's introduce the dramatis personae: * THPFunction - a PyObject that represents a ctx object that is passed to our custom Python autograd functions. It contains a PyFunction * PyFunction - a subclass of Function which says how to compute the actual backward function. Note that this is a closure in the sense that it is associated with a ctx object which actually contains the saved tensors. It contains a non-owning pointer to THPFunction OK, do you see something wrong with this situation? The ownership here is totally backwards! It is PyFunction that should own the THPFunction, because a PyFunction is a closure that should own its context. To put it in other words, it is OK if a PyFunction gets freed before THPFunction, but not vice versa. This sounds nice, but making this happen in practice is a bit trickier, because sometimes we use THPFunction to represent the PyFunction. In particular, we represent a `grad_fn` object in the Python binding level as the THPFunction (I can only imagine we did this because we wanted the object identity of `grad_fn` to line up with the class the user provided). I did not have time to untangle this situation, so for this patch, in the cases where we're improperly treating a THPFunction as a PyFunction, we simply assume that the PyFunction is still live (it usually is), and yell at the user if it's not. I constructed this patch in a few steps: 1. First, I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so that I could have an escape hatch when the existing code improperly treated THPFunction as an owning reference to PyFunction. In some cases your code might break because of this, but most of the time the PyFunction is live and everything works out (intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live). 2. There are some cases where PyFunction is *provably* live at some point in time. In that case, don't bother extracting a strong reference from the weak pointer, just pass the strong pointer along directly! 3. For all the other cases where we can't prove it, write a long error message to help the user out when they run into that situation. Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. - Generalizing the issue above: this is BC-breaking for legacy autograd Functions in the following way: previously, the PyFunction for a THPFunction was generated immediately on construction of the Function `f` (which would eventually serve as the `grad_fn` of an object when you finally call `f(x)`.) This means that operations like `f.register_hook()` which previously worked prior to `f(x)` no longer work; you have to wait until you actually call the function to do registration. I also removed dead Function::get_shared_ptr and Decref struct; these were previously needed for the crazy split refcounting scheme we had previously, but now are dead code. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Fixes #16532 and #14960. This patch is a massive hack. The way I constructed it was I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so all existing code works. Essentially, this patch assumes that PyFunction stays live as long as you have a THPFunction: intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live. But as you can see from the presently skipped tests (specifically, test_hook_none), this is not always true. But it seems to be true for the code we care about, and that's enough for me! Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 00cfcf5 Pull Request resolved: #22983
|
I've resolved all of the code review comments. |
Stack from [ghstack](https://github.com/ezyang/ghstack): * **#22983 Invert ownership between PyFunction and THPFunction.** Fixes #1653, #14960 and #18619. Whence this fix? Let's introduce the dramatis personae: * THPFunction - a PyObject that represents a ctx object that is passed to our custom Python autograd functions. It contains a PyFunction * PyFunction - a subclass of Function which says how to compute the actual backward function. Note that this is a closure in the sense that it is associated with a ctx object which actually contains the saved tensors. It contains a non-owning pointer to THPFunction OK, do you see something wrong with this situation? The ownership here is totally backwards! It is PyFunction that should own the THPFunction, because a PyFunction is a closure that should own its context. To put it in other words, it is OK if a PyFunction gets freed before THPFunction, but not vice versa. This sounds nice, but making this happen in practice is a bit trickier, because sometimes we use THPFunction to represent the PyFunction. In particular, we represent a `grad_fn` object in the Python binding level as the THPFunction (I can only imagine we did this because we wanted the object identity of `grad_fn` to line up with the class the user provided). I did not have time to untangle this situation, so for this patch, in the cases where we're improperly treating a THPFunction as a PyFunction, we simply assume that the PyFunction is still live (it usually is), and yell at the user if it's not. I constructed this patch in a few steps: 1. First, I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so that I could have an escape hatch when the existing code improperly treated THPFunction as an owning reference to PyFunction. In some cases your code might break because of this, but most of the time the PyFunction is live and everything works out (intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live). 2. There are some cases where PyFunction is *provably* live at some point in time. In that case, don't bother extracting a strong reference from the weak pointer, just pass the strong pointer along directly! 3. For all the other cases where we can't prove it, write a long error message to help the user out when they run into that situation. Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. - Generalizing the issue above: this is BC-breaking for legacy autograd Functions in the following way: previously, the PyFunction for a THPFunction was generated immediately on construction of the Function `f` (which would eventually serve as the `grad_fn` of an object when you finally call `f(x)`.) This means that operations like `f.register_hook()` which previously worked prior to `f(x)` no longer work; you have to wait until you actually call the function to do registration. I also removed dead Function::get_shared_ptr and Decref struct; these were previously needed for the crazy split refcounting scheme we had previously, but now are dead code. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Fixes #16532 and #14960. This patch is a massive hack. The way I constructed it was I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so all existing code works. Essentially, this patch assumes that PyFunction stays live as long as you have a THPFunction: intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live. But as you can see from the presently skipped tests (specifically, test_hook_none), this is not always true. But it seems to be true for the code we care about, and that's enough for me! Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: c515578 Pull Request resolved: #22983
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Some minor comments below
| // (otherwise the condition is trivially true). Then there is a PyFunction | ||
| // which contains an owning reference to this object. But we are only | ||
| // allowed to clear if all owning references are gone! Contradiction. | ||
| TORCH_INTERNAL_ASSERT(!self->cdata.lock()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use self->cdata.expired() here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also see https://cplusplus.github.io/LWG/lwg-active.html#2751. Note that THPFunction_clear is typically called during shared_ptr destructor. Seems likely that the desired behavior will be guaranteed in future revisions so probably OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks that is a very nice point :)
| self->cdata.set_next_edges(std::move(input_info.next_edges)); | ||
| Py_INCREF(self); | ||
| std::shared_ptr<PyFunction> cdata; | ||
| if (cdata = self->cdata.lock()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assignment is easy to misread here. Consider refactoring it (move the assignment to the definition of cdata) or adding an explicit comparison to make it more obvious.
| auto& input_info = info_pair.second; | ||
| bool is_executable = input_info.is_executable; | ||
| self->cdata.set_next_edges(std::move(input_info.next_edges)); | ||
| Py_INCREF(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this Py_INCREF(self) for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so that I can give self as an owning reference to THPObjectPtr. Which means I'm leaking in the TORCH_WARN case. Better move it in!
| auto num_args = PyTuple_GET_SIZE(inputs); | ||
| THPObjectPtr ctx_input_tuple(PyTuple_New(num_args + 1)); | ||
| PyTuple_SET_ITEM(ctx_input_tuple.get(), 0, ctx_obj.release()); | ||
| Py_INCREF(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This predates your change, but the code is missing a check that ctx_input_tuple is non-null (i.e. that the PyTuple_New succeeded)
Stack from [ghstack](https://github.com/ezyang/ghstack): * **#22983 Invert ownership between PyFunction and THPFunction.** Fixes #1653, #14960 and #18619. Whence this fix? Let's introduce the dramatis personae: * THPFunction - a PyObject that represents a ctx object that is passed to our custom Python autograd functions. It contains a PyFunction * PyFunction - a subclass of Function which says how to compute the actual backward function. Note that this is a closure in the sense that it is associated with a ctx object which actually contains the saved tensors. It contains a non-owning pointer to THPFunction OK, do you see something wrong with this situation? The ownership here is totally backwards! It is PyFunction that should own the THPFunction, because a PyFunction is a closure that should own its context. To put it in other words, it is OK if a PyFunction gets freed before THPFunction, but not vice versa. This sounds nice, but making this happen in practice is a bit trickier, because sometimes we use THPFunction to represent the PyFunction. In particular, we represent a `grad_fn` object in the Python binding level as the THPFunction (I can only imagine we did this because we wanted the object identity of `grad_fn` to line up with the class the user provided). I did not have time to untangle this situation, so for this patch, in the cases where we're improperly treating a THPFunction as a PyFunction, we simply assume that the PyFunction is still live (it usually is), and yell at the user if it's not. I constructed this patch in a few steps: 1. First, I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so that I could have an escape hatch when the existing code improperly treated THPFunction as an owning reference to PyFunction. In some cases your code might break because of this, but most of the time the PyFunction is live and everything works out (intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live). 2. There are some cases where PyFunction is *provably* live at some point in time. In that case, don't bother extracting a strong reference from the weak pointer, just pass the strong pointer along directly! 3. For all the other cases where we can't prove it, write a long error message to help the user out when they run into that situation. Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. - Generalizing the issue above: this is BC-breaking for legacy autograd Functions in the following way: previously, the PyFunction for a THPFunction was generated immediately on construction of the Function `f` (which would eventually serve as the `grad_fn` of an object when you finally call `f(x)`.) This means that operations like `f.register_hook()` which previously worked prior to `f(x)` no longer work; you have to wait until you actually call the function to do registration. I also removed dead Function::get_shared_ptr and Decref struct; these were previously needed for the crazy split refcounting scheme we had previously, but now are dead code. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Fixes #16532 and #14960. This patch is a massive hack. The way I constructed it was I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so all existing code works. Essentially, this patch assumes that PyFunction stays live as long as you have a THPFunction: intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live. But as you can see from the presently skipped tests (specifically, test_hook_none), this is not always true. But it seems to be true for the code we care about, and that's enough for me! Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: a25e840 Pull Request resolved: #22983
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, but LGTM
| namespace pybind11 { namespace detail { | ||
|
|
||
| // handle Python <-> torch::autograd::Function conversions | ||
| template <> struct type_caster<std::shared_ptr<torch::autograd::Function>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this BC breaking for people who used to poke around .grad_fn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, Python side grad_fn poking still works after this patch, so that didn't exercise this code. I think this specifically only affects people who were writing pybind11 bindings for autograd functions. I have the tiniest violin.
| auto f = (THPFunction*) obj; | ||
| auto name = std::string(Py_TYPE(f)->tp_name); | ||
| THPObjectPtr _legacy(PyObject_GetAttrString(obj, "_is_legacy")); | ||
| // Python API functions are not const-correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They cannot be, because attribute lookups can have arbitrary side effects 😄
| if (auto pyhook = dynamic_cast<PyFunctionPreHook*>(hook.get())) { | ||
| Py_VISIT(pyhook->dict); | ||
| // cdata could be null if someone constructed a legacy function but haven't | ||
| // actually called backward() on it yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if the backward function went "out of scope"
| // trigger deletion of other objects, and they can reference this function, | ||
| // seeing it in a half-deleted state. | ||
| auto pre_hooks = std::move(self->cdata.pre_hooks()); | ||
| auto post_hooks = std::move(self->cdata.post_hooks()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that this was actually important, but 🤷🏻♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're clear here. Proving that I don't need to do this is a big reason why we have the TORCH_INTERNAL_ASSERT(self->cdata.expired()); assert at the top of the function. The claim is that PyFunction will always destruct first, and delete all of the hook objects, BEFORE we start deletion of THPFunction.
| } | ||
| THPUtils_assert(PyTuple_GET_SIZE(raw_grad_output) == self->cdata.num_inputs(), | ||
| auto cdata = self->cdata.lock(); | ||
| // In obscure situations, cdata might be nullptr because it's expired. THAT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhhhh are you sure? If it's expired then how did we get to do_backward? Isn't it the PyFunction that calls this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also just call _do_backward straight up. So the order of operations is, you save out the grad_fn, free the variables, and then call grad_fn._do_backward. It's dumb, not worth making it work better, but we do have an error check if this does happen :)
| "was called. This could occur if you manually called _do_backward on Function. " | ||
| "In any case, this is very naughty! If you absolutely need this to work, " | ||
| "try porting your code to use non-legacy autograd function, see: " | ||
| "https://pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh lol, I think this is a bit paranoid, but w/e
| self->cdata.add_pre_hook(std::move(hook)); | ||
| auto cdata = self->cdata.lock(); | ||
| TORCH_CHECK(cdata, | ||
| "Legacy autograd function had _register_hook_dict called before the function was " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very bad error message to tell the user. Please figure out which methods call this one and write something that doesn't mention internal APIs. This gives completely no guidance about the potential fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand it's legacy, so I guess I don't really care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's easy enough to fix.
| // This is really a true assert, because we've already tested for the | ||
| // self->has_freed_buffers case at the beginning of this function: | ||
| // buffers are freed when PyFunction dies; if the buffers are not freed, | ||
| // PyFunction must be live. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm are you sure? I think those are orthogonal conditions. has_freed_buffers is only triggered by running backward, but you can have cdata get freed with no backward runs involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, it's a superset condition. If PyFunction is destructed, has_freed_buffers is guaranteed to be true, but not vice versa. That's good enough here.
Stack from [ghstack](https://github.com/ezyang/ghstack): * **#22983 Invert ownership between PyFunction and THPFunction.** Fixes #1653, #14960 and #18619. Whence this fix? Let's introduce the dramatis personae: * THPFunction - a PyObject that represents a ctx object that is passed to our custom Python autograd functions. It contains a PyFunction * PyFunction - a subclass of Function which says how to compute the actual backward function. Note that this is a closure in the sense that it is associated with a ctx object which actually contains the saved tensors. It contains a non-owning pointer to THPFunction OK, do you see something wrong with this situation? The ownership here is totally backwards! It is PyFunction that should own the THPFunction, because a PyFunction is a closure that should own its context. To put it in other words, it is OK if a PyFunction gets freed before THPFunction, but not vice versa. This sounds nice, but making this happen in practice is a bit trickier, because sometimes we use THPFunction to represent the PyFunction. In particular, we represent a `grad_fn` object in the Python binding level as the THPFunction (I can only imagine we did this because we wanted the object identity of `grad_fn` to line up with the class the user provided). I did not have time to untangle this situation, so for this patch, in the cases where we're improperly treating a THPFunction as a PyFunction, we simply assume that the PyFunction is still live (it usually is), and yell at the user if it's not. I constructed this patch in a few steps: 1. First, I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so that I could have an escape hatch when the existing code improperly treated THPFunction as an owning reference to PyFunction. In some cases your code might break because of this, but most of the time the PyFunction is live and everything works out (intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live). 2. There are some cases where PyFunction is *provably* live at some point in time. In that case, don't bother extracting a strong reference from the weak pointer, just pass the strong pointer along directly! 3. For all the other cases where we can't prove it, write a long error message to help the user out when they run into that situation. Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. - Generalizing the issue above: this is BC-breaking for legacy autograd Functions in the following way: previously, the PyFunction for a THPFunction was generated immediately on construction of the Function `f` (which would eventually serve as the `grad_fn` of an object when you finally call `f(x)`.) This means that operations like `f.register_hook()` which previously worked prior to `f(x)` no longer work; you have to wait until you actually call the function to do registration. I also removed dead Function::get_shared_ptr and Decref struct; these were previously needed for the crazy split refcounting scheme we had previously, but now are dead code. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Fixes #16532 and #14960. This patch is a massive hack. The way I constructed it was I flipped the ownership between PyFunction and THPFunction, but maintained a weak pointer from THPFunction to PyFunction so all existing code works. Essentially, this patch assumes that PyFunction stays live as long as you have a THPFunction: intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live. But as you can see from the presently skipped tests (specifically, test_hook_none), this is not always true. But it seems to be true for the code we care about, and that's enough for me! Some subtleties: - PyFunction is a C++ object that refers to a PyObject. This means it needs a custom deleter to handle deleting the PyObject, since you can't assume you have the GIL when it dies. - The old test_gc_in_destructor failed our internal assert because we never actually ran a backwards, and thus never actually materialized PyFunction. I'm chalking this up as "misuse of API" and rewrote the test to not have this problem. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 75917fb Pull Request resolved: #22983
Stack from ghstack:
Fixes #1653, #14960 and #18619.
Whence this fix? Let's introduce the dramatis personae:
OK, do you see something wrong with this situation? The ownership here is totally backwards! It is PyFunction that should own the THPFunction, because a PyFunction is a closure that should own its context. To put it in other words, it is OK if a PyFunction gets freed before THPFunction, but not vice versa.
This sounds nice, but making this happen in practice is a bit trickier, because sometimes we use THPFunction to represent the PyFunction. In particular, we represent a
grad_fnobject in the Python binding level as the THPFunction (I can only imagine we did this because we wanted the object identity ofgrad_fnto line up with the class the user provided). I did not have time to untangle this situation, so for this patch, in the cases where we're improperly treating a THPFunction as a PyFunction, we simply assume that the PyFunction is still live (it usually is), and yell at the user if it's not.I constructed this patch in a few steps:
THPFunction to PyFunction so that I could have an escape hatch when the existing code improperly treated THPFunction as an owning reference to PyFunction. In some cases your code might break because of this, but most of the time the PyFunction is live and everything works out (intuitively, this makes sense, since the ctx object should only really stay live as long as you're actually going to execute the backwards, which will keep the PyFunction live).
Some subtleties:
a custom deleter to handle deleting the PyObject, since you can't assume
you have the GIL when it dies.
actually ran a backwards, and thus never actually materialized PyFunction.
I'm chalking this up as "misuse of API" and rewrote the test to not have
this problem.
in the following way: previously, the PyFunction for a THPFunction was generated
immediately on construction of the Function
f(which would eventually serve asthe
grad_fnof an object when you finally callf(x).) This means thatoperations like
f.register_hook()which previously worked prior tof(x)no longerwork; you have to wait until you actually call the function to do registration.
I also removed dead Function::get_shared_ptr and Decref struct; these were previously needed for the crazy split refcounting scheme we had previously, but now are dead code.
Signed-off-by: Edward Z. Yang ezyang@fb.com
Differential Revision: D16422209