Gradient Descent (SGD)

What is Gradient Descent (SGD)?

An optimization algorithm used to minimize the loss function by iteratively moving in the direction of steepest descent. SGD uses small batches to estimate the gradient.

Where did the term "Gradient Descent (SGD)" come from?

The engine that powers neural network training.

How is "Gradient Descent (SGD)" used today?

Variants like Adam are used in virtually all LLM/Vision training.

Related Terms