An operation that takes the average of the entire feature map, collapsing it to a single value per channel. Used to connect convolutional layers to the final classification layer.
Replaces parameter-heavy fully connected layers.
Used in modern architectures (ResNet, EfficientNet).