Batch Size

What is Batch Size?

The number of training examples processed together in one iteration. The model updates its weights once per batch. Larger batches provide a more stable gradient estimate but require more VRAM, while smaller batches add noise that can help escape local minima.

Where did the term "Batch Size" come from?

A fundamental hyperparameter in Stochastic Gradient Descent (SGD) and its variants.

How is "Batch Size" used today?

Critical for optimizing training speed and stability, often limited by the available GPU memory.

Related Terms