improve RNN docs (fixes #3587) (#7669)

t-vi · soumith · commit 8212f576dbeb · 2018-05-18T16:41:03.000+02:00
diff --git a/torch/nn/modules/rnn.py b/torch/nn/modules/rnn.py
@@ -278,13 +278,21 @@ class RNN(RNNBase):
           Defaults to zero if not provided.
 
     Outputs: output, h_n
-        - **output** of shape `(seq_len, batch, hidden_size * num_directions)`: tensor
+        - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor
           containing the output features (`h_k`) from the last layer of the RNN,
           for each `k`.  If a :class:`torch.nn.utils.rnn.PackedSequence` has
           been given as the input, the output will also be a packed sequence.
+
+          For the unpacked case, the directions can be separated
+          using ``output.view(seq_len, batch, num_directions, hidden_size)``,
+          with forward and backward being direction `0` and `1` respectively.
+          Similarly, the directions can be separated in the packed case.
         - **h_n** (num_layers * num_directions, batch, hidden_size): tensor
           containing the hidden state for `k = seq_len`.
 
+          Like *output*, the layers can be separated using
+          ``h_n.view(num_layers, num_directions, batch, hidden_size)``.
+
     Attributes:
         weight_ih_l[k]: the learnable input-hidden weights of the k-th layer,
             of shape `(hidden_size * input_size)` for `k = 0`. Otherwise, the shape is
@@ -377,12 +385,20 @@ class LSTM(RNNBase):
 
 
     Outputs: output, (h_n, c_n)
-        - **output** of shape `(seq_len, batch, hidden_size * num_directions)`: tensor
+        - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor
           containing the output features `(h_t)` from the last layer of the LSTM,
           for each t. If a :class:`torch.nn.utils.rnn.PackedSequence` has been
           given as the input, the output will also be a packed sequence.
+
+          For the unpacked case, the directions can be separated
+          using ``output.view(seq_len, batch, num_directions, hidden_size)``,
+          with forward and backward being direction `0` and `1` respectively.
+          Similarly, the directions can be separated in the packed case.
         - **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
-          containing the hidden state for `t = seq_len`
+          containing the hidden state for `t = seq_len`.
+
+          Like *output*, the layers can be separated using
+          ``h_n.view(num_layers, num_directions, batch, hidden_size)`` and similarly for *c_n*.
         - **c_n** (num_layers * num_directions, batch, hidden_size): tensor
           containing the cell state for `t = seq_len`
 
@@ -457,13 +473,21 @@ class GRU(RNNBase):
           Defaults to zero if not provided.
 
     Outputs: output, h_n
-        - **output** of shape `(seq_len, batch, hidden_size * num_directions)`: tensor
+        - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor
           containing the output features h_t from the last layer of the GRU,
           for each t. If a :class:`torch.nn.utils.rnn.PackedSequence` has been
           given as the input, the output will also be a packed sequence.
+          For the unpacked case, the directions can be separated
+          using ``output.view(seq_len, batch, num_directions, hidden_size)``,
+          with forward and backward being direction `0` and `1` respectively.
+
+          Similarly, the directions can be separated in the packed case.
         - **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
           containing the hidden state for `t = seq_len`
 
+          Like *output*, the layers can be separated using
+          ``h_n.view(num_layers, num_directions, batch, hidden_size)``.
+
     Attributes:
         weight_ih_l[k] : the learnable input-hidden weights of the :math:`\text{k}^{th}` layer
             (W_ir|W_iz|W_in), of shape `(3*hidden_size x input_size)`