Metrics are used in evaluation to assess the quality of a model. Most are "streaming" ops, meaning they create variables to accumulate a running total, and return an update tensor to update these variables, and a value tensor to read the accumulated value. Example:
value, update_op = metrics.streaming_mean_squared_error( predictions, targets, weight)
Most metric functions take a pair of tensors, predictions and ground truth
targets (streaming_mean is an exception, it takes a single value tensor,
usually a loss). It is assumed that the shape of both these tensors is of the
form [batch_size, d1, ... dN] where batch_size is the number of samples in
the batch and d1 ... dN are the remaining dimensions.
The weight parameter can be used to adjust the relative weight of samples
within the batch. The result of each loss is a scalar average of all sample
losses with non-zero weights.
The result is 2 tensors that should be used like the following for each eval run:
predictions = ...
labels = ...
value, update_op = some_metric(predictions, labels)
for step_num in range(max_steps):
update_op.run()
print "evaluation score: ", value.eval()