Adam (Adaptive Moment Estimation)

What is Adam (Adaptive Moment Estimation)?

The most widely used optimization algorithm for training deep learning models. Adam combines the benefits of two other extensions of stochastic gradient descent: Momentum (which smoothes the optimization path using a moving average of gradients) and RMSProp (which scales the learning rate based on the magnitude of recent gradients). This allows it to handle sparse gradients and non-stationary objectives efficiently, often requiring little hyperparameter tuning.

Where did the term "Adam (Adaptive Moment Estimation)" come from?

Introduced by Diederik Kingma and Jimmy Ba in their 2014 paper 'Adam: A Method for Stochastic Optimization'.

How is "Adam (Adaptive Moment Estimation)" used today?

The default optimizer in PyTorch and TensorFlow for most tasks, from vision to NLP.

Related Terms

Gradient Descent
Learning Rate
RMSProp
Stochastic Gradient Descent (SGD)