Skip to content Skip to sidebar Skip to footer

Keras: Binary_crossentropy & Categorical_crossentropy Confusion

After using TensorFlow for quite a while I have read some Keras tutorials and implemented some examples. I have found several tutorials for convolutional autoencoders that use kera

Solution 1:

You are right by defining areas where each of these losses are applicable:

  • binary_crossentropy (and tf.nn.sigmoid_cross_entropy_with_logits under the hood) is for binary multi-label classification (labels are independent).
  • categorical_crossentropy (and tf.nn.softmax_cross_entropy_with_logits under the hood) is for multi-class classification (classes are exclusive).

See also the detailed analysis in this question.

I'm not sure what tutorials you mean, so can't comment whether binary_crossentropy is a good or bad choice for autoencoders.

As for the naming, it is absolutely correct and reasonable. Or do you think sigmoid and softmax names sound better?

So the only confusion left in your question is the categorical_crossentropy documentation. Note that everything that has been stated is correct: the loss supports one-hot representation. This function indeed works with any probability distribution for labels (in addition to one-hot vectors) in case of tensorflow backend and it could be included into the doc, but this doesn't look critical to me. Moreover, need to check if soft classes are supported in other backends, theano and CNTK. Remember that keras tries to be minimalistic and targets for most popular use cases, so I can understand the logic here.

Solution 2:

Not sure if this answers your question, but for softmax loss the output layer needs to be a probability distribution (i.e. sum to 1), for binary crossentropy loss it doesn't. Simple as that. (Binary doesn't mean that there are only 2 output classes, it just means that each output is binary.)

Solution 3:

The documentation doesn't mention that BinaryCrossentropy can be used for multi-label classification and that can be confusing. But it can also be used for a binary classifier (when we have only 2 exclusive classes like cats and dogs) - see classical example. But in this case we have to set n_classes=1:

tf.keras.layers.Dense(units=1)

Also BinaryCrossentropy and tf.keras.losses.binary_crossentropy have different behavior.

Let's look at the example from the documentation to prove that it is actually for multi-label classification.

y_true = tf.convert_to_tensor([[0, 1], [0, 0]])
y_pred = tf.convert_to_tensor([[0.6, 0.4], [0.4, 0.6]])

bce = tf.keras.losses.BinaryCrossentropy()
loss1 = bce(y_true=y_true, y_pred=y_pred)
# <tf.Tensor:shape=(),dtype=float32,numpy=0.81492424>

loss2 = tf.keras.losses.binary_crossentropy(y_true, y_pred)
# <tf.Tensor:shape=(2,),dtype=float32,numpy=array([0.9162905 , 0.71355796], dtype=float32)>

np.mean(loss2.numpy())
# 0.81492424

scce = tf.keras.losses.SparseCategoricalCrossentropy()
y_true = tf.convert_to_tensor([0, 0])
scce(y_true, y_pred)
# <tf.Tensor:shape=(),dtype=float32,numpy=0.71355814>
y_true = tf.convert_to_tensor([1, 0])
scce(y_true, y_pred)
# <tf.Tensor:shape=(),dtype=float32,numpy=0.9162907>

Post a Comment for "Keras: Binary_crossentropy & Categorical_crossentropy Confusion"