-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[distributions] Always enable grad when calculating lazy_property #7708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| test() | ||
| with torch.no_grad(): | ||
| test() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| if instance is None: | ||
| return self | ||
| value = self.wrapped(instance) | ||
| with torch.enable_grad(): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
cc @fritzo (sorry, didn't notice you're already tagged) |
|
cc @neerajprad It seems a little forceful to always enable grad. One option that seems consistent to me is to inherit class Distribution(object):
def __init__(self, ...):
...
self._grad_enabled = torch.autograd.get_grad_enabled() # what is the actual syntax?
class lazy_property(object):
...
def __get__(self, instance, obj_type=None):
if instance is None:
return self
with torch.autograd.set_grad_enabled(instance._grad_enabled):
value = self.wrapped(instance)
setattr(instance, self.wrapped.__name__, value)
return valueHowever it's unclear whether we should be clever or whether we should simply require strict usage: "grad_enabled should have a single value throughout the lifetime of a distribution object". The distributions are intended to be flyweight objects and to be cheap to reconstruct in each grad_enabled context. |
|
I think the "constant grad mode" invariant is a bit too easy to get wrong. Forcing grad mode doesn't seem that bad really. If your distribution parameters don't require grad then it will be a no-op. Otherwise it's quite likely that you will be interested in differentiating those parts, and I think it's better to trade off some memory for compute in this case. It's a hard problem, because you don't have any information about how the object will be used. You either need to drop the autograd history, hoping that someone who calls |
|
@fritzo @apaszke @neerajprad have we reached consensus on whether we should merge this? :) |
|
This seems reasonable to me (I'm convinced by @apaszke). |
…torch#7708) * Always enable grad when calculating lazy_property * Add test with MultiVariableNormal
Fixes #7705 .
@apaszke @fritzo @alicanb