[TOC]
RNN Cells and additional RNN operations. See @{$python/contrib.rnn} guide.
Abstract object representing an RNN cell.
The definition of cell in this package differs from the definition used in the literature. In the literature, cell refers to an object with a single scalar output. The definition in this package refers to a horizontal array of such units.
An RNN cell, in the most abstract setting, is anything that has
a state and performs some operation that takes a matrix of inputs.
This operation results in an output matrix with self.output_size columns.
If self.state_size is an integer, this operation also results in a new
state matrix with self.state_size columns. If self.state_size is a
tuple of integers, then it results in a tuple of len(state_size) state
matrices, each with a column size corresponding to values in state_size.
This module provides a number of basic commonly used RNN cells, such as
LSTM (Long Short Term Memory) or GRU (Gated Recurrent Unit), and a number
of operators that allow add dropouts, projections, or embeddings for inputs.
Constructing multi-layer cells is supported by the class MultiRNNCell,
or by calling the rnn ops several times. Every RNNCell must have the
properties below and implement __call__ with the following signature.
Run this RNN cell on inputs, starting from the given state.
inputs:2-Dtensor with shape[batch_size x input_size].state: ifself.state_sizeis an integer, this should be a2-D Tensorwith shape[batch_size x self.state_size]. Otherwise, ifself.state_sizeis a tuple of integers, this should be a tuple with shapes[batch_size x s] for s in self.state_size.scope: VariableScope for the created subgraph; defaults to class name.
A pair containing:
- Output: A
2-Dtensor with shape[batch_size x self.output_size]. - New state: Either a single
2-Dtensor, or a tuple of tensors matching the arity and shapes ofstate.
Integer or TensorShape: size of outputs produced by this cell.
size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
The most basic RNN cell.
Most basic RNN: output = new_state = act(W * input + U * state + B).
tf.contrib.rnn.BasicRNNCell.__init__(num_units, input_size=None, activation=tanh) {#BasicRNNCell.init}
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Basic LSTM recurrent network cell.
The implementation is based on: http://arxiv.org/abs/1409.2329.
We add forget_bias (default: 1) to the biases of the forget gate in order to reduce the scale of forgetting in the beginning of the training.
It does not allow cell clipping, a projection layer, and does not use peep-hole connections: it is the basic baseline.
For advanced models, please use the full LSTMCell that follows.
Long short-term memory cell (LSTM).
tf.contrib.rnn.BasicLSTMCell.__init__(num_units, forget_bias=1.0, input_size=None, state_is_tuple=True, activation=tanh) {#BasicLSTMCell.init}
Initialize the basic LSTM cell.
num_units: int, The number of units in the LSTM cell.forget_bias: float, The bias added to forget gates (see above).input_size: Deprecated and unused.state_is_tuple: If True, accepted and returned states are 2-tuples of thec_stateandm_state. If False, they are concatenated along the column axis. The latter behavior will soon be deprecated.activation: Activation function of the inner states.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078).
Gated recurrent unit (GRU) with nunits cells.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Long short-term memory unit (LSTM) recurrent network cell.
The default non-peephole implementation is based on:
http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf
S. Hochreiter and J. Schmidhuber. "Long Short-Term Memory". Neural Computation, 9(8):1735-1780, 1997.
The peephole implementation is based on:
https://research.google.com/pubs/archive/43905.pdf
Hasim Sak, Andrew Senior, and Francoise Beaufays. "Long short-term memory recurrent neural network architectures for large scale acoustic modeling." INTERSPEECH, 2014.
The class uses optional peep-hole connections, optional cell clipping, and an optional projection layer.
Run one step of LSTM.
inputs: input Tensor, 2D, batch x num_units.state: ifstate_is_tupleis False, this must be a state Tensor,2-D, batch x state_size. Ifstate_is_tupleis True, this must be a tuple of state Tensors, both2-D, with column sizesc_stateandm_state.scope: VariableScope for the created subgraph; defaults to "lstm_cell".
A tuple containing:
- A
2-D, [batch x output_dim], Tensor representing the output of the LSTM after readinginputswhen previous state wasstate. Here output_dim is: num_proj if num_proj was set, num_units otherwise. - Tensor(s) representing the new state of LSTM after reading
inputswhen the previous state wasstate. Same type and shape(s) asstate.
ValueError: If input size cannot be inferred from inputs via static shape inference.
tf.contrib.rnn.LSTMCell.__init__(num_units, input_size=None, use_peepholes=False, cell_clip=None, initializer=None, num_proj=None, proj_clip=None, num_unit_shards=None, num_proj_shards=None, forget_bias=1.0, state_is_tuple=True, activation=tanh) {#LSTMCell.init}
Initialize the parameters for an LSTM cell.
num_units: int, The number of units in the LSTM cellinput_size: Deprecated and unused.use_peepholes: bool, set True to enable diagonal/peephole connections.cell_clip: (optional) A float value, if provided the cell state is clipped by this value prior to the cell output activation.initializer: (optional) The initializer to use for the weight and projection matrices.num_proj: (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed.proj_clip: (optional) A float value. Ifnum_proj > 0andproj_clipis provided, then the projected values are clipped elementwise to within[-proj_clip, proj_clip].num_unit_shards: Deprecated, will be removed by Jan. 2017. Use a variable_scope partitioner instead.num_proj_shards: Deprecated, will be removed by Jan. 2017. Use a variable_scope partitioner instead.forget_bias: Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training.state_is_tuple: If True, accepted and returned states are 2-tuples of thec_stateandm_state. If False, they are concatenated along the column axis. This latter behavior will soon be deprecated.activation: Activation function of the inner states.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
LSTM unit with layer normalization and recurrent dropout.
This class adds layer normalization and recurrent dropout to a basic LSTM unit. Layer normalization implementation is based on:
https://arxiv.org/abs/1607.06450.
"Layer Normalization" Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
and is applied before the internal nonlinearities. Recurrent dropout is base on:
https://arxiv.org/abs/1603.05118
"Recurrent Dropout without Memory Loss" Stanislau Semeniuta, Aliaksei Severyn, Erhardt Barth.
tf.contrib.rnn.LayerNormBasicLSTMCell.__call__(inputs, state, scope=None) {#LayerNormBasicLSTMCell.call}
LSTM cell with layer normalization and recurrent dropout.
tf.contrib.rnn.LayerNormBasicLSTMCell.__init__(num_units, forget_bias=1.0, input_size=None, activation=tanh, layer_norm=True, norm_gain=1.0, norm_shift=0.0, dropout_keep_prob=1.0, dropout_prob_seed=None) {#LayerNormBasicLSTMCell.init}
Initializes the basic LSTM cell.
num_units: int, The number of units in the LSTM cell.forget_bias: float, The bias added to forget gates (see above).input_size: Deprecated and unused.activation: Activation function of the inner states.layer_norm: IfTrue, layer normalization will be applied.norm_gain: float, The layer normalization gain initial value. Iflayer_normhas been set toFalse, this argument will be ignored.norm_shift: float, The layer normalization shift initial value. Iflayer_normhas been set toFalse, this argument will be ignored.dropout_keep_prob: unit Tensor or float between 0 and 1 representing the recurrent dropout probability value. If float and 1.0, no dropout will be applied.dropout_prob_seed: (optional) integer, the randomness seed.
tf.contrib.rnn.LayerNormBasicLSTMCell.zero_state(batch_size, dtype) {#LayerNormBasicLSTMCell.zero_state}
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Tuple used by LSTM Cells for state_size, zero_state, and output state.
Stores two elements: (c, h), in that order.
Only used when state_is_tuple=True.
Return self as a plain tuple. Used by copy and pickle.
Exclude the OrderedDict from pickling
Create new instance of LSTMStateTuple(c, h)
Return a nicely formatted representation string
Alias for field number 0
Alias for field number 1
RNN cell composed sequentially of multiple simple cells.
Run this multi-layer cell on inputs, starting from state.
Create a RNN cell composed sequentially of a number of RNNCells.
cells: list of RNNCells that will be composed in this order.state_is_tuple: If True, accepted and returned states are n-tuples, wheren = len(cells). If False, the states are all concatenated along the column axis. This latter behavior will soon be deprecated.
ValueError: if cells is empty (not allowed), or at least one of the cells returns a state tuple but the flagstate_is_tupleisFalse.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
This is a helper class that provides housekeeping for LSTM cells.
This may be useful for alternative LSTM and similar type of cells.
The subclasses must implement _call_cell method and num_units property.
tf.contrib.rnn.LSTMBlockWrapper.__call__(inputs, initial_state=None, dtype=None, sequence_length=None, scope=None) {#LSTMBlockWrapper.call}
Run this LSTM on inputs, starting from the given state.
inputs:3-Dtensor with shape[time_len, batch_size, input_size]or a list oftime_lentensors of shape[batch_size, input_size].initial_state: a tuple(initial_cell_state, initial_output)with tensors of shape[batch_size, self._num_units]. If this is not provided, the cell is expected to create a zero initial state of typedtype.dtype: The data type for the initial state and expected output. Required ifinitial_stateis not provided or RNN state has a heterogeneous dtype.sequence_length: Specifies the length of each sequence in inputs. Anint32orint64vector (tensor) size[batch_size], values in[0, time_len).Defaults totime_lenfor each element.scope:VariableScopefor the created subgraph; defaults to class name.
A pair containing:
- Output: A
3-Dtensor of shape[time_len, batch_size, output_size]or a list of time_len tensors of shape[batch_size, output_size], to match the type of theinputs. - Final state: a tuple
(cell_state, output)matchinginitial_state.
ValueError: in case of shape mismatches
Number of units in this cell (output dimension).
Operator adding dropout to inputs and outputs of the given cell.
Run the cell with the declared dropouts.
tf.contrib.rnn.DropoutWrapper.__init__(cell, input_keep_prob=1.0, output_keep_prob=1.0, seed=None) {#DropoutWrapper.init}
Create a cell with added input and/or output dropout.
Dropout is never used on the state.
cell: an RNNCell, a projection to output_size is added to it.input_keep_prob: unit Tensor or float between 0 and 1, input keep probability; if it is float and 1, no input dropout will be added.output_keep_prob: unit Tensor or float between 0 and 1, output keep probability; if it is float and 1, no output dropout will be added.seed: (optional) integer, the randomness seed.
TypeError: if cell is not an RNNCell.ValueError: if keep_prob is not between 0 and 1.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Operator adding input embedding to the given cell.
Note: in many cases it may be more efficient to not use this wrapper, but instead concatenate the whole sequence of your inputs in time, do the embedding on this batch-concatenated sequence, then split it and feed into your RNN.
Run the cell on embedded inputs.
tf.contrib.rnn.EmbeddingWrapper.__init__(cell, embedding_classes, embedding_size, initializer=None) {#EmbeddingWrapper.init}
Create a cell with an added input embedding.
cell: an RNNCell, an embedding will be put before its inputs.embedding_classes: integer, how many symbols will be embedded.embedding_size: integer, the size of the vectors we embed into.initializer: an initializer to use when creating the embedding; if None, the initializer from variable scope or a default one is used.
TypeError: if cell is not an RNNCell.ValueError: if embedding_classes is not positive.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Operator adding an input projection to the given cell.
Note: in many cases it may be more efficient to not use this wrapper, but instead concatenate the whole sequence of your inputs in time, do the projection on this batch-concatenated sequence, then split it.
tf.contrib.rnn.InputProjectionWrapper.__call__(inputs, state, scope=None) {#InputProjectionWrapper.call}
Run the input projection and then the cell.
tf.contrib.rnn.InputProjectionWrapper.__init__(cell, num_proj, input_size=None) {#InputProjectionWrapper.init}
Create a cell with input projection.
cell: an RNNCell, a projection of inputs is added before it.num_proj: Python integer. The dimension to project to.input_size: Deprecated and unused.
TypeError: if cell is not an RNNCell.
tf.contrib.rnn.InputProjectionWrapper.zero_state(batch_size, dtype) {#InputProjectionWrapper.zero_state}
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Operator adding an output projection to the given cell.
Note: in many cases it may be more efficient to not use this wrapper, but instead concatenate the whole sequence of your outputs in time, do the projection on this batch-concatenated sequence, then split it if needed or directly feed into a softmax.
tf.contrib.rnn.OutputProjectionWrapper.__call__(inputs, state, scope=None) {#OutputProjectionWrapper.call}
Run the cell and output projection on inputs, starting from state.
Create a cell with output projection.
cell: an RNNCell, a projection to output_size is added to it.output_size: integer, the size of the output after projection.
TypeError: if cell is not an RNNCell.ValueError: if output_size is not positive.
tf.contrib.rnn.OutputProjectionWrapper.zero_state(batch_size, dtype) {#OutputProjectionWrapper.zero_state}
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Operator that ensures an RNNCell runs on a particular device.
Run the cell on specified device.
Construct a DeviceWrapper for cell with device device.
Ensures the wrapped cell is called with tf.device(device).
cell: An instance ofRNNCell.device: A device string or function, for passing totf.device.
Integer or TensorShape: size of outputs produced by this cell.
size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
RNNCell wrapper that ensures cell inputs are added to the outputs.
Run the cell and add its inputs to its outputs.
inputs: cell inputs.state: cell state.scope: optional cell scope.
Tuple of cell outputs and new state.
TypeError: If cell inputs and outputs have different structure (type).ValueError: If cell inputs and outputs have different structure (value).
Constructs a ResidualWrapper for cell.
cell: An instance ofRNNCell.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Basic LSTM recurrent network cell.
The implementation is based on: http://arxiv.org/abs/1409.2329.
We add forget_bias (default: 1) to the biases of the forget gate in order to
reduce the scale of forgetting in the beginning of the training.
Unlike core_rnn_cell.LSTMCell, this is a monolithic op and should be much
faster. The weight and bias matrixes should be compatible as long as the
variable scope matches.
Long short-term memory cell (LSTM).
tf.contrib.rnn.LSTMBlockCell.__init__(num_units, forget_bias=1.0, use_peephole=False) {#LSTMBlockCell.init}
Initialize the basic LSTM cell.
num_units: int, The number of units in the LSTM cell.forget_bias: float, The bias added to forget gates (see above).use_peephole: Whether to use peephole connections or not.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Block GRU cell implementation.
The implementation is based on: http://arxiv.org/abs/1406.1078 Computes the LSTM cell forward propagation for 1 time step.
This kernel op implements the following mathematical equations:
Biases are initialized with:
b_ru- constant_initializer(1.0)b_c- constant_initializer(0.0)
x_h_prev = [x, h_prev]
[r_bar u_bar] = x_h_prev * w_ru + b_ru
r = sigmoid(r_bar)
u = sigmoid(u_bar)
h_prevr = h_prev \circ r
x_h_prevr = [x h_prevr]
c_bar = x_h_prevr * w_c + b_c
c = tanh(c_bar)
h = (1-u) \circ c + u \circ h_prev
GRU cell.
Initialize the Block GRU cell.
cell_size: int, GRU cell size.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Abstract object representing a fused RNN cell.
A fused RNN cell represents the entire RNN expanded over the time dimension. In effect, this represents an entire recurrent network.
Unlike RNN cells which are subclasses of rnn_cell.RNNCell, a FusedRNNCell
operates on the entire time sequence at once, by putting the loop over time
inside the cell. This usually leads to much more efficient, but more complex
and less flexible implementations.
Every FusedRNNCell must implement __call__ with the following signature.
tf.contrib.rnn.FusedRNNCell.__call__(inputs, initial_state=None, dtype=None, sequence_length=None, scope=None) {#FusedRNNCell.call}
Run this fused RNN on inputs, starting from the given state.
inputs:3-Dtensor with shape[time_len x batch_size x input_size]or a list oftime_lentensors of shape[batch_size x input_size].initial_state: either a tensor with shape[batch_size x state_size]or a tuple with shapes[batch_size x s] for s in state_size, if the cell takes tuples. If this is not provided, the cell is expected to create a zero initial state of typedtype.dtype: The data type for the initial state and expected output. Required ifinitial_stateis not provided or RNN state has a heterogeneous dtype.sequence_length: Specifies the length of each sequence in inputs. Anint32orint64vector (tensor) size[batch_size], values in[0, time_len). Defaults totime_lenfor each element.scope:VariableScopeorstringfor the created subgraph; defaults to class name.
A pair containing:
- Output: A
3-Dtensor of shape[time_len x batch_size x output_size]or a list oftime_lentensors of shape[batch_size x output_size], to match the type of theinputs. - Final state: Either a single
2-Dtensor, or a tuple of tensors matching the arity and shapes ofinitial_state.
This is an adaptor for RNNCell classes to be used with FusedRNNCell.
tf.contrib.rnn.FusedRNNCellAdaptor.__call__(inputs, initial_state=None, dtype=None, sequence_length=None, scope=None) {#FusedRNNCellAdaptor.call}
tf.contrib.rnn.FusedRNNCellAdaptor.__init__(cell, use_dynamic_rnn=False) {#FusedRNNCellAdaptor.init}
Initialize the adaptor.
cell: an instance of a subclass of arnn_cell.RNNCell.use_dynamic_rnn: whether to use dynamic (or static) RNN.
This is an adaptor to time-reverse a FusedRNNCell.
For example,
cell = tf.contrib.rnn.BasicRNNCell(10)
fw_lstm = tf.contrib.rnn.FusedRNNCellAdaptor(cell, use_dynamic_rnn=True)
bw_lstm = tf.contrib.rnn.TimeReversedFusedRNN(fw_lstm)
fw_out, fw_state = fw_lstm(inputs)
bw_out, bw_state = bw_lstm(inputs)tf.contrib.rnn.TimeReversedFusedRNN.__call__(inputs, initial_state=None, dtype=None, sequence_length=None, scope=None) {#TimeReversedFusedRNN.call}
FusedRNNCell implementation of LSTM.
This is an extremely efficient LSTM implementation, that uses a single TF op for the entire LSTM. It should be both faster and more memory-efficient than LSTMBlockCell defined above.
The implementation is based on: http://arxiv.org/abs/1409.2329.
We add forget_bias (default: 1) to the biases of the forget gate in order to reduce the scale of forgetting in the beginning of the training.
The variable naming is consistent with core_rnn_cell.LSTMCell.
tf.contrib.rnn.LSTMBlockFusedCell.__call__(inputs, initial_state=None, dtype=None, sequence_length=None, scope=None) {#LSTMBlockFusedCell.call}
Run this LSTM on inputs, starting from the given state.
inputs:3-Dtensor with shape[time_len, batch_size, input_size]or a list oftime_lentensors of shape[batch_size, input_size].initial_state: a tuple(initial_cell_state, initial_output)with tensors of shape[batch_size, self._num_units]. If this is not provided, the cell is expected to create a zero initial state of typedtype.dtype: The data type for the initial state and expected output. Required ifinitial_stateis not provided or RNN state has a heterogeneous dtype.sequence_length: Specifies the length of each sequence in inputs. Anint32orint64vector (tensor) size[batch_size], values in[0, time_len).Defaults totime_lenfor each element.scope:VariableScopefor the created subgraph; defaults to class name.
A pair containing:
- Output: A
3-Dtensor of shape[time_len, batch_size, output_size]or a list of time_len tensors of shape[batch_size, output_size], to match the type of theinputs. - Final state: a tuple
(cell_state, output)matchinginitial_state.
ValueError: in case of shape mismatches
tf.contrib.rnn.LSTMBlockFusedCell.__init__(num_units, forget_bias=1.0, cell_clip=None, use_peephole=False) {#LSTMBlockFusedCell.init}
Initialize the LSTM cell.
num_units: int, The number of units in the LSTM cell.forget_bias: float, The bias added to forget gates (see above).cell_clip: clip the cell to this value. Defaults to3.use_peephole: Whether to use peephole connections or not.
Number of units in this cell (output dimension).
Long short-term memory unit (LSTM) recurrent network cell.
The default non-peephole implementation is based on:
http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf
S. Hochreiter and J. Schmidhuber. "Long Short-Term Memory". Neural Computation, 9(8):1735-1780, 1997.
The peephole implementation is based on:
https://research.google.com/pubs/archive/43905.pdf
Hasim Sak, Andrew Senior, and Francoise Beaufays. "Long short-term memory recurrent neural network architectures for large scale acoustic modeling." INTERSPEECH, 2014.
The coupling of input and forget gate is based on:
http://arxiv.org/pdf/1503.04069.pdf
Greff et al. "LSTM: A Search Space Odyssey"
The class uses optional peep-hole connections, and an optional projection layer.
tf.contrib.rnn.CoupledInputForgetGateLSTMCell.__call__(inputs, state, scope=None) {#CoupledInputForgetGateLSTMCell.call}
Run one step of LSTM.
inputs: input Tensor, 2D, batch x num_units.state: ifstate_is_tupleis False, this must be a state Tensor,2-D, batch x state_size. Ifstate_is_tupleis True, this must be a tuple of state Tensors, both2-D, with column sizesc_stateandm_state.scope: VariableScope for the created subgraph; defaults to "LSTMCell".
A tuple containing:
- A
2-D, [batch x output_dim], Tensor representing the output of the LSTM after readinginputswhen previous state wasstate. Here output_dim is: num_proj if num_proj was set, num_units otherwise. - Tensor(s) representing the new state of LSTM after reading
inputswhen the previous state wasstate. Same type and shape(s) asstate.
ValueError: If input size cannot be inferred from inputs via static shape inference.
tf.contrib.rnn.CoupledInputForgetGateLSTMCell.__init__(num_units, use_peepholes=False, initializer=None, num_proj=None, proj_clip=None, num_unit_shards=1, num_proj_shards=1, forget_bias=1.0, state_is_tuple=False, activation=tanh) {#CoupledInputForgetGateLSTMCell.init}
Initialize the parameters for an LSTM cell.
-
num_units: int, The number of units in the LSTM cell -
use_peepholes: bool, set True to enable diagonal/peephole connections. -
initializer: (optional) The initializer to use for the weight and projection matrices. -
num_proj: (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed. -
proj_clip: (optional) A float value. Ifnum_proj > 0andproj_clipis provided, then the projected values are clipped elementwise to within[-proj_clip, proj_clip]. -
num_unit_shards: How to split the weight matrix. If >1, the weight matrix is stored across num_unit_shards. -
num_proj_shards: How to split the projection matrix. If >1, the projection matrix is stored across num_proj_shards. -
forget_bias: Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training. -
state_is_tuple: If True, accepted and returned states are 2-tuples of thec_stateandm_state. By default (False), they are concatenated along the column axis. This default behavior will soon be deprecated. -
activation: Activation function of the inner states.
tf.contrib.rnn.CoupledInputForgetGateLSTMCell.output_size {#CoupledInputForgetGateLSTMCell.output_size}
tf.contrib.rnn.CoupledInputForgetGateLSTMCell.state_size {#CoupledInputForgetGateLSTMCell.state_size}
tf.contrib.rnn.CoupledInputForgetGateLSTMCell.zero_state(batch_size, dtype) {#CoupledInputForgetGateLSTMCell.zero_state}
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Time-Frequency Long short-term memory unit (LSTM) recurrent network cell.
This implementation is based on:
Tara N. Sainath and Bo Li "Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks." submitted to INTERSPEECH, 2016.
It uses peep-hole connections and optional cell clipping.
Run one step of LSTM.
inputs: input Tensor, 2D, batch x num_units.state: state Tensor, 2D, batch x state_size.scope: VariableScope for the created subgraph; defaults to "TimeFreqLSTMCell".
A tuple containing:
- A 2D, batch x output_dim, Tensor representing the output of the LSTM after reading "inputs" when previous state was "state". Here output_dim is num_units.
- A 2D, batch x state_size, Tensor representing the new state of LSTM after reading "inputs" when previous state was "state".
ValueError: if an input_size was specified and the provided inputs have a different dimension.
tf.contrib.rnn.TimeFreqLSTMCell.__init__(num_units, use_peepholes=False, cell_clip=None, initializer=None, num_unit_shards=1, forget_bias=1.0, feature_size=None, frequency_skip=None) {#TimeFreqLSTMCell.init}
Initialize the parameters for an LSTM cell.
num_units: int, The number of units in the LSTM celluse_peepholes: bool, set True to enable diagonal/peephole connections.cell_clip: (optional) A float value, if provided the cell state is clipped by this value prior to the cell output activation.initializer: (optional) The initializer to use for the weight and projection matrices.num_unit_shards: int, How to split the weight matrix. If >1, the weight matrix is stored across num_unit_shards.forget_bias: float, Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training.feature_size: int, The size of the input feature the LSTM spans over.frequency_skip: int, The amount the LSTM filter is shifted by in frequency.
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Grid Long short-term memory unit (LSTM) recurrent network cell.
The default is based on: Nal Kalchbrenner, Ivo Danihelka and Alex Graves "Grid Long Short-Term Memory," Proc. ICLR 2016. http://arxiv.org/abs/1507.01526
When peephole connections are used, the implementation is based on: Tara N. Sainath and Bo Li "Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks." submitted to INTERSPEECH, 2016.
The code uses optional peephole connections, shared_weights and cell clipping.
Run one step of LSTM.
inputs: input Tensor, 2D, [batch, feature_size].state: Tensor or tuple of Tensors, 2D, [batch, state_size], depends on the flag self._state_is_tuple.scope: (optional) VariableScope for the created subgraph; if None, it defaults to "GridLSTMCell".
A tuple containing:
- A 2D, [batch, output_dim], Tensor representing the output of the LSTM after reading "inputs" when previous state was "state". Here output_dim is num_units.
- A 2D, [batch, state_size], Tensor representing the new state of LSTM after reading "inputs" when previous state was "state".
ValueError: if an input_size was specified and the provided inputs have a different dimension.
tf.contrib.rnn.GridLSTMCell.__init__(num_units, use_peepholes=False, share_time_frequency_weights=False, cell_clip=None, initializer=None, num_unit_shards=1, forget_bias=1.0, feature_size=None, frequency_skip=None, num_frequency_blocks=None, start_freqindex_list=None, end_freqindex_list=None, couple_input_forget_gates=False, state_is_tuple=False) {#GridLSTMCell.init}
Initialize the parameters for an LSTM cell.
num_units: int, The number of units in the LSTM celluse_peepholes: (optional) bool, default False. Set True to enable diagonal/peephole connections.share_time_frequency_weights: (optional) bool, default False. Set True to enable shared cell weights between time and frequency LSTMs.cell_clip: (optional) A float value, default None, if provided the cell state is clipped by this value prior to the cell output activation.initializer: (optional) The initializer to use for the weight and projection matrices, default None.num_unit_shards: (optional) int, defualt 1, How to split the weight matrix. If > 1,the weight matrix is stored across num_unit_shards.forget_bias: (optional) float, default 1.0, The initial bias of the forget gates, used to reduce the scale of forgetting at the beginning of the training.feature_size: (optional) int, default None, The size of the input feature the LSTM spans over.frequency_skip: (optional) int, default None, The amount the LSTM filter is shifted by in frequency.num_frequency_blocks: [required] A list of frequency blocks needed to cover the whole input feature splitting defined by start_freqindex_list and end_freqindex_list.start_freqindex_list: [optional], list of ints, default None, The starting frequency index for each frequency block.end_freqindex_list: [optional], list of ints, default None. The ending frequency index for each frequency block.couple_input_forget_gates: (optional) bool, default False, Whether to couple the input and forget gates, i.e. f_gate = 1.0 - i_gate, to reduce model parameters and computation cost.state_is_tuple: If True, accepted and returned states are 2-tuples of thec_stateandm_state. By default (False), they are concatenated along the column axis. This default behavior will soon be deprecated.
ValueError: if the num_frequency_blocks list is not specified
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Basic attention cell wrapper.
Implementation based on https://arxiv.org/abs/1409.0473.
tf.contrib.rnn.AttentionCellWrapper.__call__(inputs, state, scope=None) {#AttentionCellWrapper.call}
Long short-term memory cell with attention (LSTMA).
tf.contrib.rnn.AttentionCellWrapper.__init__(cell, attn_length, attn_size=None, attn_vec_size=None, input_size=None, state_is_tuple=False) {#AttentionCellWrapper.init}
Create a cell with attention.
cell: an RNNCell, an attention is added to it.attn_length: integer, the size of an attention window.attn_size: integer, the size of an attention vector. Equal to cell.output_size by default.attn_vec_size: integer, the number of convolutional features calculated on attention state and a size of the hidden layer built from base cell state. Equal attn_size to by default.input_size: integer, the size of a hidden linear layer, built from inputs and attention. Derived from the input tensor by default.state_is_tuple: If True, accepted and returned states are n-tuples, wheren = len(cells). By default (False), the states are all concatenated along the column axis.
TypeError: if cell is not an RNNCell.ValueError: if cell returns a state tuple but the flagstate_is_tupleisFalseor if attn_length is zero or less.
tf.contrib.rnn.AttentionCellWrapper.zero_state(batch_size, dtype) {#AttentionCellWrapper.zero_state}
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
Wraps step execution in an XLA JIT scope.
Create CompiledWrapper cell.
cell: Instance ofRNNCell.compile_stateful: Whether to compile stateful ops like initializers and random number generators (default: False).
Return zero-filled state tensor(s).
batch_size: int, float, or unit Tensor representing the batch size.dtype: the data type to use for the state.
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size x state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size x s] for each s in state_size.
tf.contrib.rnn.static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None) {#static_rnn}
Creates a recurrent neural network specified by RNNCell cell.
The simplest form of RNN network generated is:
state = cell.zero_state(...)
outputs = []
for input_ in inputs:
output, state = cell(input_, state)
outputs.append(output)
return (outputs, state)However, a few other options are available:
An initial state can be provided. If the sequence_length vector is provided, dynamic calculation is performed. This method of calculation does not compute the RNN steps past the maximum sequence length of the minibatch (thus saving computational time), and properly propagates the state at an example's sequence length to the final state output.
The dynamic calculation performed is, at time t for batch row b,
(output, state)(b, t) =
(t >= sequence_length(b))
? (zeros(cell.output_size), states(b, sequence_length(b) - 1))
: cell(input(b, t), state(b, t - 1))cell: An instance of RNNCell.inputs: A length T list of inputs, each aTensorof shape[batch_size, input_size], or a nested tuple of such elements.initial_state: (optional) An initial state for the RNN. Ifcell.state_sizeis an integer, this must be aTensorof appropriate type and shape[batch_size, cell.state_size]. Ifcell.state_sizeis a tuple, this should be a tuple of tensors having shapes[batch_size, s] for s in cell.state_size.dtype: (optional) The data type for the initial state and expected output. Required if initial_state is not provided or RNN state has a heterogeneous dtype.sequence_length: Specifies the length of each sequence in inputs. An int32 or int64 vector (tensor) size[batch_size], values in[0, T).scope: VariableScope for the created subgraph; defaults to "rnn".
A pair (outputs, state) where:
- outputs is a length T list of outputs (one for each input), or a nested tuple of such elements.
- state is the final state
TypeError: Ifcellis not an instance of RNNCell.ValueError: IfinputsisNoneor an empty list, or if the input depth (column size) cannot be inferred from inputs via shape inference.
tf.contrib.rnn.static_state_saving_rnn(cell, inputs, state_saver, state_name, sequence_length=None, scope=None) {#static_state_saving_rnn}
RNN that accepts a state saver for time-truncated RNN calculation.
cell: An instance ofRNNCell.inputs: A length T list of inputs, each aTensorof shape[batch_size, input_size].state_saver: A state saver object with methodsstateandsave_state.state_name: Python string or tuple of strings. The name to use with the state_saver. If the cell returns tuples of states (i.e.,cell.state_sizeis a tuple) thenstate_nameshould be a tuple of strings having the same length ascell.state_size. Otherwise it should be a single string.sequence_length: (optional) An int32/int64 vector size [batch_size]. See the documentation for rnn() for more details about sequence_length.scope: VariableScope for the created subgraph; defaults to "rnn".
A pair (outputs, state) where: outputs is a length T list of outputs (one for each input) states is the final state
TypeError: Ifcellis not an instance of RNNCell.ValueError: IfinputsisNoneor an empty list, or if the arity and type ofstate_namedoes not match that ofcell.state_size.
tf.contrib.rnn.static_bidirectional_rnn(cell_fw, cell_bw, inputs, initial_state_fw=None, initial_state_bw=None, dtype=None, sequence_length=None, scope=None) {#static_bidirectional_rnn}
Creates a bidirectional recurrent neural network.
Similar to the unidirectional case above (rnn) but takes input and builds independent forward and backward RNNs with the final forward and backward outputs depth-concatenated, such that the output will have the format [time][batch][cell_fw.output_size + cell_bw.output_size]. The input_size of forward and backward cell must match. The initial state for both directions is zero by default (but can be set optionally) and no intermediate states are ever returned -- the network is fully unrolled for the given (passed in) length(s) of the sequence(s) or completely unrolled if length(s) is not given.
cell_fw: An instance of RNNCell, to be used for forward direction.cell_bw: An instance of RNNCell, to be used for backward direction.inputs: A length T list of inputs, each a tensor of shape [batch_size, input_size], or a nested tuple of such elements.initial_state_fw: (optional) An initial state for the forward RNN. This must be a tensor of appropriate type and shape[batch_size, cell_fw.state_size]. Ifcell_fw.state_sizeis a tuple, this should be a tuple of tensors having shapes[batch_size, s] for s in cell_fw.state_size.initial_state_bw: (optional) Same as forinitial_state_fw, but using the corresponding properties ofcell_bw.dtype: (optional) The data type for the initial state. Required if either of the initial states are not provided.sequence_length: (optional) An int32/int64 vector, size[batch_size], containing the actual lengths for each of the sequences.scope: VariableScope for the created subgraph; defaults to "bidirectional_rnn"
A tuple (outputs, output_state_fw, output_state_bw) where:
outputs is a length T list of outputs (one for each input), which
are depth-concatenated forward and backward outputs.
output_state_fw is the final state of the forward rnn.
output_state_bw is the final state of the backward rnn.
TypeError: Ifcell_fworcell_bwis not an instance ofRNNCell.ValueError: If inputs is None or an empty list.
tf.contrib.rnn.stack_bidirectional_dynamic_rnn(cells_fw, cells_bw, inputs, initial_states_fw=None, initial_states_bw=None, dtype=None, sequence_length=None, scope=None) {#stack_bidirectional_dynamic_rnn}
Creates a dynamic bidirectional recurrent neural network.
Stacks several bidirectional rnn layers. The combined forward and backward layer outputs are used as input of the next layer. tf.bidirectional_rnn does not allow to share forward and backward information between layers. The input_size of the first forward and backward cells must match. The initial state for both directions is zero and no intermediate states are returned.
cells_fw: List of instances of RNNCell, one per layer, to be used for forward direction.cells_bw: List of instances of RNNCell, one per layer, to be used for backward direction.inputs: A length T list of inputs, each a tensor of shape [batch_size, input_size], or a nested tuple of such elements.initial_states_fw: (optional) A list of the initial states (one per layer) for the forward RNN. Each tensor must has an appropriate type and shape[batch_size, cell_fw.state_size].initial_states_bw: (optional) Same as forinitial_states_fw, but using the corresponding properties ofcells_bw.dtype: (optional) The data type for the initial state. Required if either of the initial states are not provided.sequence_length: (optional) An int32/int64 vector, size[batch_size], containing the actual lengths for each of the sequences.scope: VariableScope for the created subgraph; defaults to None.
A tuple (outputs, output_state_fw, output_state_bw) where:
outputs: OutputTensorshaped:batch_size, max_time, layers_output]. Where layers_output are depth-concatenated forward and backward outputs. output_states_fw is the final states, one tensor per layer, of the forward rnn. output_states_bw is the final states, one tensor per layer, of the backward rnn.
TypeError: Ifcell_fworcell_bwis not an instance ofRNNCell.ValueError: If inputs isNone, not a list or an empty list.