Skip to content

Commit 1eaa10b

Browse files
vishwakftwsoumith
authored andcommitted
Update torch.distributions documentation (#5050)
* Add a small paragraph for pathwise estimator * Add differentiability as well * Add small snippet and clear some grammatical errors * Update documentation to reflect has_rsample * Add a fix for ExponentialFamily docs * Update __init__.py
1 parent 7bd2db9 commit 1eaa10b

File tree

2 files changed

+16
-1
lines changed

2 files changed

+16
-1
lines changed

docs/source/distributions.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Probability distributions - torch.distributions
1616
:hidden:`ExponentialFamily`
1717
~~~~~~~~~~~~~~~~~~~~~~~~~~~
1818

19-
..autoclass:: ExponentialFamily
19+
.. autoclass:: ExponentialFamily
2020
:members:
2121

2222
:hidden:`Bernoulli`

torch/distributions/__init__.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,21 @@
2828
next_state, reward = env.step(action)
2929
loss = -m.log_prob(action) * reward
3030
loss.backward()
31+
32+
Another way to implement these stochastic/policy gradients would be to use the
33+
reparameterization trick from :meth:`~torch.distributions.Distribution.rsample`
34+
method, where the parameterized random variable can be defined as a parameterized
35+
deterministic function of a parameter-free random variable. The reparameterized sample
36+
is required to be differentiable. The code for implementing the pathwise estimation would
37+
be as follows::
38+
39+
params = policy_network(state)
40+
m = Normal(*params)
41+
# any distribution with .has_rsample == True could work based on the application
42+
action = m.rsample()
43+
next_state, reward = env.step(action) # Assume that reward is differentiable
44+
loss = -reward
45+
loss.backward()
3146
"""
3247

3348
from .bernoulli import Bernoulli

0 commit comments

Comments
 (0)