The most widely used activation function in deep learning, defined as f(x) = max(0, x). It outputs the input directly if it is positive, otherwise, it outputs zero. Its simplicity enables efficient computation and helps mitigate the 'vanishing gradient' problem that plagued earlier functions like Sigmoid and Tanh, allowing for the training of much deeper neural networks.
Popularized by Nair & Hinton (2010) and solidified by its success in AlexNet (2012).
The default activation function for Convolutional Neural Networks (CNNs) and many other deep architectures.