[distributions] Always enable grad when calculating lazy_property #7708

ssnl · 2018-05-19T19:35:34Z

Fixes #7705 .

test/test_distributions.py

+
+        test()
+        with torch.no_grad():
+            test()


torch/distributions/utils.py

        if instance is None:
            return self
-        value = self.wrapped(instance)
+        with torch.enable_grad():


apaszke · 2018-05-20T10:37:49Z

cc @fritzo (sorry, didn't notice you're already tagged)

fritzo · 2018-05-20T18:33:51Z

cc @neerajprad

It seems a little forceful to always enable grad. One option that seems consistent to me is to inherit grad_enabled from its setting at initialization time

class Distribution(object):
    def __init__(self, ...):
        ...
        self._grad_enabled = torch.autograd.get_grad_enabled()  # what is the actual syntax?

class lazy_property(object):
    ...
    def __get__(self, instance, obj_type=None):
        if instance is None:
            return self
        with torch.autograd.set_grad_enabled(instance._grad_enabled):
            value = self.wrapped(instance)
        setattr(instance, self.wrapped.__name__, value)
        return value

However it's unclear whether we should be clever or whether we should simply require strict usage: "grad_enabled should have a single value throughout the lifetime of a distribution object". The distributions are intended to be flyweight objects and to be cheap to reconstruct in each grad_enabled context.

apaszke · 2018-05-20T20:44:33Z

I think the "constant grad mode" invariant is a bit too easy to get wrong. Forcing grad mode doesn't seem that bad really. If your distribution parameters don't require grad then it will be a no-op. Otherwise it's quite likely that you will be interested in differentiating those parts, and I think it's better to trade off some memory for compute in this case.

It's a hard problem, because you don't have any information about how the object will be used. You either need to drop the autograd history, hoping that someone who calls .sample() will never need it, or you need to just live with the fact that you might end up wasting some memory, because we temporarily enabled grad. Provided the distribution is short-lived the downside is somewhat irrelevant though, because the properties should get removed quickly. If we find out that this strategy doesn't work too well, we can always add an extra constructor parameter that lets you choose a strategy for caching those.

ssnl · 2018-05-24T03:56:27Z

@fritzo @apaszke @neerajprad have we reached consensus on whether we should merge this? :)

fritzo · 2018-05-24T04:05:29Z

This seems reasonable to me (I'm convinced by @apaszke).

…torch#7708) * Always enable grad when calculating lazy_property * Add test with MultiVariableNormal

Always enable grad when calculating lazy_property

db3aab1

ssnl requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners May 19, 2018 19:35

onnxbot-worker-1 mentioned this pull request May 19, 2018

[auto] pytorch-pr-7708 onnxbot/onnx-fb-universe#2148

Open

apaszke reviewed May 19, 2018

View reviewed changes

Add test with MultiVariableNormal

e860e39

apaszke approved these changes May 24, 2018

View reviewed changes

ssnl merged commit c946db1 into pytorch:master May 24, 2018

ssnl deleted the grad_lazy_prop branch May 24, 2018 15:22

weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018

[distributions] Always enable grad when calculating lazy_property (py…

e54da12

…torch#7708) * Always enable grad when calculating lazy_property * Add test with MultiVariableNormal

ssnl mentioned this pull request Aug 3, 2018

a strange torch.no_grad behaviour when used with lazy_property from distributions #10207

Closed

ezyang added the open source label Jun 24, 2019

shu65 mentioned this pull request Aug 7, 2020

Error when using no_grad () and lazy_property pfnet/pfrl#40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[distributions] Always enable grad when calculating lazy_property #7708

[distributions] Always enable grad when calculating lazy_property #7708

Uh oh!

ssnl commented May 19, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

apaszke commented May 20, 2018 •

edited

Loading

Uh oh!

fritzo commented May 20, 2018 •

edited

Loading

Uh oh!

apaszke commented May 20, 2018

Uh oh!

ssnl commented May 24, 2018

Uh oh!

fritzo commented May 24, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[distributions] Always enable grad when calculating lazy_property #7708

[distributions] Always enable grad when calculating lazy_property #7708

Uh oh!

Conversation

ssnl commented May 19, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

apaszke commented May 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fritzo commented May 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apaszke commented May 20, 2018

Uh oh!

ssnl commented May 24, 2018

Uh oh!

fritzo commented May 24, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

apaszke commented May 20, 2018 •

edited

Loading

fritzo commented May 20, 2018 •

edited

Loading