In machine learning, a hyperparameter is a configuration variable that is set before the training process begins. Unlike model parameters, which are learned from the data during training (e.g., weights and biases in a neural network), hyperparameters control the learning process itself. Examples include the learning rate, the number of hidden layers in a neural network, the batch size, and the number of epochs. The choice of hyperparameters can have a significant impact on the model's performance.
The concept of hyperparameters became prominent with the formalization of machine learning practices. As models grew more complex, it became necessary to distinguish between the internal parameters learned by the model and the external, structural parameters set by the practitioner to guide the training process. This distinction is fundamental to designing and optimizing machine learning models.
Hyperparameter tuning is a critical and often time-consuming step in the machine learning workflow, essential for achieving optimal model performance. It is a universal concept applied across all domains of machine learning, from traditional models to deep learning. Entire subfields and automated tools have emerged to tackle the challenge of finding the best set of hyperparameters, with common techniques including Grid Search, Random Search, and Bayesian Optimization.