The core optimization algorithm where the model weights are updated based on the gradient (error) of a single random sample or small batch, introducing helpful noise.
Where did the term "Stochastic Gradient Descent (SGD)" come from?
Robbins & Monro (1951).
How is "Stochastic Gradient Descent (SGD)" used today?
The grandfather of all modern neural network training.