A strategy to adjust the learning rate during training (e.g., Warmup, Cosine Decay). It helps the model converge faster and avoid getting stuck in local minima.
Critical for training stability.
Used in every modern training run.