Stochastic Gradient Descent (SGD)

What is Stochastic Gradient Descent (SGD)?

The core optimization algorithm where the model weights are updated based on the gradient (error) of a single random sample or small batch, introducing helpful noise.

Where did the term "Stochastic Gradient Descent (SGD)" come from?

Robbins & Monro (1951).

How is "Stochastic Gradient Descent (SGD)" used today?

The grandfather of all modern neural network training.

Related Terms