Update torch.distributions documentation (#5050)

vishwakftw · soumith · commit 1eaa10b32ee2 · 2018-02-05T13:57:38.000-05:00
* Add a small paragraph for pathwise estimator

* Add differentiability as well

* Add small snippet and clear some grammatical errors

* Update documentation to reflect has_rsample

* Add a fix for ExponentialFamily docs

* Update __init__.py
diff --git a/docs/source/distributions.rst b/docs/source/distributions.rst
@@ -16,7 +16,7 @@ Probability distributions - torch.distributions
 :hidden:`ExponentialFamily`
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-..autoclass:: ExponentialFamily
+.. autoclass:: ExponentialFamily
     :members:
 
 :hidden:`Bernoulli`
diff --git a/torch/distributions/__init__.py b/torch/distributions/__init__.py
@@ -28,6 +28,21 @@
     next_state, reward = env.step(action)
     loss = -m.log_prob(action) * reward
     loss.backward()
+
+Another way to implement these stochastic/policy gradients would be to use the
+reparameterization trick from :meth:`~torch.distributions.Distribution.rsample`
+method, where the parameterized random variable can be defined as a parameterized
+deterministic function of a parameter-free random variable. The reparameterized sample
+is required to be differentiable. The code for implementing the pathwise estimation would
+be as follows::
+
+    params = policy_network(state)
+    m = Normal(*params)
+    # any distribution with .has_rsample == True could work based on the application
+    action = m.rsample()
+    next_state, reward = env.step(action)  # Assume that reward is differentiable
+    loss = -reward
+    loss.backward()
 """
 
 from .bernoulli import Bernoulli