Global Average Pooling (GAP) is a pooling operation used in convolutional neural networks (CNNs) that reduces each feature map to a single value by taking the average of all values in that map. This drastically reduces the spatial dimensions of the feature maps, resulting in a low-dimensional feature vector that can be fed directly into a softmax layer for classification. Unlike fully-connected layers, GAP has no trainable parameters, which helps to prevent overfitting.
The concept was introduced by Min Lin, Qiang Chen, and Shuicheng Yan in their 2013 paper 'Network in Network'. They proposed it as a structurally simpler alternative to traditional fully-connected layers at the end of a CNN, aiming to improve model interpretability and reduce overfitting.
GAP has become a standard component in many modern CNN architectures, including ResNet, GoogLeNet (Inception), and EfficientNet. Its effectiveness at reducing model complexity and improving generalization has made it a popular choice for a wide range of computer vision tasks.