您的位置:首页 > 运维架构

How to choose cross-entropy loss in tensorflow?

2017-11-20 14:20 495 查看
machine learning - How to choose cross-entropy loss in tensorflow? - Stack Overflow
https://stackoverflow.com/questions/47034888/how-to-choose-cross-entropy-loss-in-tensorflow
0down
votefavorite

1

Classification problems, such as logistic regression or multinomial logistic regression, optimize a cross-entropy loss.
Normally, the cross-entropy layer follows the softmax layer, which produces
probability distribution.

In tensorflow, there are at least a dozen of different cross-entropy loss functions:

tf.losses.softmax_cross_entropy


tf.losses.sparse_softmax_cross_entropy


tf.losses.sigmoid_cross_entropy


tf.contrib.losses.softmax_cross_entropy


tf.contrib.losses.sigmoid_cross_entropy


tf.nn.softmax_cross_entropy_with_logits


tf.nn.sigmoid_cross_entropy_with_logits


...

Which work only for binary classification and which are suitable for multi-class problems? When should you use 
sigmoid
 instead
of 
softmax
?
How are 
sparse
 functions
different from others and why is it only 
softmax
?

Related (more math-oriented) discussion: cross-entropy
jungle.

machine-learning tensorflow neural-network logistic-regression cross-entropy
shareedit
edited Oct
31 at 16:49

asked Oct
31 at 11:59





Maxim

3,86521944

 add
a comment


1 Answer

activeoldestvotes

up
vote5down
vote


Preliminary facts

In functional sense, the sigmoid
is a partial case of the softmax function, when the number of classes equals 2. Both of them do the same operation: transform the logits (see below) to probabilities.

In simple binary classification, there's no big difference between the two, however in case of multinomial classification, sigmoid allows to deal with non-exclusive labels (a.k.a. multi-labels),
while softmax deals with exclusive classes (see below).

A logit (also called a score) is a raw
unscaled value associated with a class, before computing the probability. In terms of neural network architecture, this means that a logit is an output of a dense (fully-connected) layer.

Tensorflow naming is a bit strange: all of the functions below accept logits,
not probabilities, and apply the transformation themselves (which is simply more efficient).


Sigmoid functions family

tf.nn.sigmoid_cross_entropy_with_logits


tf.nn.weighted_cross_entropy_with_logits


tf.losses.sigmoid_cross_entropy


tf.contrib.losses.sigmoid_cross_entropy
 (DEPRECATED)

As stated earlier, 
sigmoid
 loss
function is for binary classification. But tensorflow functions are more general and allow to do multi-label classification, when the classes are independent. In other words, 
tf.nn.sigmoid_cross_entropy_with_logits
 solves 
N
 binary
classifications at once.

The labels must be one-hot encoded or can contain soft class probabilities.

tf.losses.sigmoid_cross_entropy
 in
addition allows to set the in-batch weights, i.e. make some examples
more important than others. 
tf.nn.weighted_cross_entropy_with_logits
 allows
to set class weights (remember, the classification is binary), i.e.
make positive errors larger than negative errors. This is useful when the training data is unbalanced.


Softmax functions family

tf.nn.softmax_cross_entropy_with_logits


tf.losses.softmax_cross_entropy


tf.contrib.losses.softmax_cross_entropy
 (DEPRECATED)

These loss functions should be used for multinomial mutually exclusive classification, i.e. pick one out of 
N
 classes.
Also applicable when 
N
= 2
.

The labels must be one-hot encoded or can contain soft class probabilities: a particular example can belong to class A with 50% probability and class B with 50% probability. Note that strictly speaking it doesn't mean that it belongs to both classes, but one
can interpret the probabilities this way.

Just like in 
sigmoid
 family, 
tf.losses.softmax_cross_entropy
 allows
to set the in-batch weights, i.e. make some examples more important
than others. As far as I know, as of tensorflow 1.3, there's no built-in way to set class
weights.


Sparse functions family

tf.nn.sparse_softmax_cross_entropy_with_logits


tf.losses.sparse_softmax_cross_entropy


tf.contrib.losses.sparse_softmax_cross_entropy
 (DEPRECATED)

Like ordinary 
softmax
 above,
these loss functions should be used for multinomial mutually exclusive classification, i.e. pick one out of 
N
 classes.
The difference is in labels encoding: the classes are specified as integers (class index), not one-hot vectors. Obviously, this doesn't allow soft classes, but it can save some memory when there are thousands or millions of classes. However, note that 
logits
 argument
must still contain logits per each class, thus it consumes at least 
[batch_size,
classes]
 memory.

Like above, 
tf.losses
 version
has a 
weights
 argument
which allows to set the in-batch weights.


Sampled softmax functions family

tf.nn.sampled_softmax_loss


tf.contrib.nn.rank_sampled_softmax_loss


tf.nn.nce_loss


These functions provide another alternative for dealing with huge number of classes. Instead of computing and comparing an exact probability distribution, they compute a loss estimate from a random sample.

The arguments 
weights
 and 
biases
 specify
a separate fully-connected layer that is used to compute the logits for a chosen sample.

Like above, 
labels
 are
not one-hot encoded, but have the shape 
[batch_size,
num_true]
.

Sampled functions are only suitable for training. In test time, it's recommended to use a standard 
softmax
 loss
(either sparse or one-hot) to get an actual distribution.

Another alternative loss is 
tf.nn.nce_loss
,
which performs noise-contrastive estimation (if you're interested, see
this very
detailed discussion). I've included this function to the softmax family, because NCE guarantees approximation to softmax in the limit.

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: