The most popular optimization algorithm for training deep learning models. It adapts the learning rate for each parameter, combining the best of AdaGrad and RMSProp.
Where did the term "Adam (Adaptive Moment Estimation)" come from?
Standard optimizer since ~2015.
How is "Adam (Adaptive Moment Estimation)" used today?