machine learning – What is the meaning of the word logits in TensorFlow?

machine learning – What is the meaning of the word logits in TensorFlow?

Logits is an overloaded term which can mean many different things:


In Math, Logit is a function that maps probabilities ([0, 1]) to R ((-inf, inf))

enter

Probability of 0.5 corresponds to a logit of 0. Negative logit correspond to probabilities less than 0.5, positive to > 0.5.

In ML, it can be

the vector of raw (non-normalized) predictions that a classification
model generates, which is ordinarily then passed to a normalization
function. If the model is solving a multi-class classification
problem, logits typically become an input to the softmax function. The
softmax function then generates a vector of (normalized) probabilities
with one value for each possible class.

Logits also sometimes refer to the element-wise inverse of the sigmoid function.

Just adding this clarification so that anyone who scrolls down this much can at least gets it right, since there are so many wrong answers upvoted.

Dianshengs answer and JakeJs answer get it right.
A new answer posted by Shital Shah is an even better and more complete answer.


Yes, logit as a mathematical function in statistics, but the logit used in context of neural networks is different. Statistical logit doesnt even make any sense here.


I couldnt find a formal definition anywhere, but logit basically means:

The raw predictions which come out of the last layer of the neural network.
1. This is the very tensor on which you apply the argmax function to get the predicted class.
2. This is the very tensor which you feed into the softmax function to get the probabilities for the predicted classes.


Also, from a tutorial on official tensorflow website:

Logits Layer

The final layer in our neural network is the logits layer, which will return the raw values for our predictions. We create a dense layer with 10 neurons (one for each target class 0–9), with linear activation (the default):

logits = tf.layers.dense(inputs=dropout, units=10)

If you are still confused, the situation is like this:

raw_predictions = neural_net(input_layer)
predicted_class_index_by_raw = argmax(raw_predictions)
probabilities = softmax(raw_predictions)
predicted_class_index_by_prob = argmax(probabilities)

where, predicted_class_index_by_raw and predicted_class_index_by_prob will be equal.

Another name for raw_predictions in the above code is logit.


As for the why logit… I have no idea. Sorry.
[Edit: See this answer for the historical motivations behind the term.]


Trivia

Although, if you want to, you can apply statistical logit to probabilities that come out of the softmax function.

If the probability of a certain class is p,
Then the log-odds of that class is L = logit(p).

Also, the probability of that class can be recovered as p = sigmoid(L), using the sigmoid function.

Not very useful to calculate log-odds though.

machine learning – What is the meaning of the word logits in TensorFlow?

Summary

In context of deep learning the logits layer means the layer that feeds in to softmax (or other such normalization). The output of the softmax are the probabilities for the classification task and its input is logits layer. The logits layer typically produces values from -infinity to +infinity and the softmax layer transforms it to values from 0 to 1.

Historical Context

Where does this term comes from? In 1930s and 40s, several people were trying to adapt linear regression to the problem of predicting probabilities. However linear regression produces output from -infinity to +infinity while for probabilities our desired output is 0 to 1. One way to do this is by somehow mapping the probabilities 0 to 1 to -infinity to +infinity and then use linear regression as usual. One such mapping is cumulative normal distribution that was used by Chester Ittner Bliss in 1934 and he called this probit model, short for probability unit. However this function is computationally expensive while lacking some of the desirable properties for multi-class classification. In 1944 Joseph Berkson used the function log(p/(1-p)) to do this mapping and called it logit, short for logistic unit. The term logistic regression derived from this as well.

The Confusion

Unfortunately the term logits is abused in deep learning. From pure mathematical perspective logit is a function that performs above mapping. In deep learning people started calling the layer logits layer that feeds in to logit function. Then people started calling the output values of this layer logit creating the confusion with logit the function.

TensorFlow Code

Unfortunately TensorFlow code further adds in to confusion by names like tf.nn.softmax_cross_entropy_with_logits. What does logits mean here? It just means the input of the function is supposed to be the output of last neuron layer as described above. The _with_logits suffix is redundant, confusing and pointless. Functions should be named without regards to such very specific contexts because they are simply mathematical operations that can be performed on values derived from many other domains. In fact TensorFlow has another similar function sparse_softmax_cross_entropy where they fortunately forgot to add _with_logits suffix creating inconsistency and adding in to confusion. PyTorch on the other hand simply names its function without these kind of suffixes.

Reference

The Logit/Probit lecture slides is one of the best resource to understand logit. I have also updated Wikipedia article with some of above information.

Leave a Reply

Your email address will not be published.