Softmax is a mathematical function that converts a vector of numbers (logits) into a vector of probabilities, where the probabilities of each value are proportional to the relative scale of each value in the vector. The output values are between 0 and 1 and sum to 1. It is commonly used as the final activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.
The function is a generalization of the logistic function. It was named 'softmax' in the context of neural networks by Bridle in 1990, but the mathematical form originates from statistical mechanics (Boltzmann distribution).
Softmax is the standard activation function for the output layer of multi-class classification neural networks. It is essential for tasks where the model needs to choose one class out of many (e.g., classifying an image as a dog, cat, or bird).