Empirical Risk Minimization (ERM)

What is Empirical Risk Minimization (ERM)?

Empirical Risk Minimization (ERM) is a foundational principle in machine learning that guides the training of models. It states that the best model is the one that minimizes the 'empirical risk'—the average loss or error on the training data. In essence, instead of trying to minimize the 'true risk' (the error on all possible data, which is unknown), we use the training data as a proxy and find a hypothesis that performs best on it. This is the principle behind training most supervised learning algorithms, from linear regression to deep neural networks.

Where did the term "Empirical Risk Minimization (ERM)" come from?

The principle of ERM was formalized by Vladimir Vapnik and Alexey Chervonenkis in the 1960s and 1970s as a core part of statistical learning theory (or VC theory). Their work provided the theoretical justification for why minimizing error on a limited training set can lead to a model that generalizes well to unseen data, provided that the model's complexity is properly controlled.

How is "Empirical Risk Minimization (ERM)" used today?

ERM is the default learning strategy for the vast majority of machine learning models. Every time a model is trained by minimizing a loss function on a dataset (like Mean Squared Error in regression or Cross-Entropy in classification), it is an application of the ERM principle. However, a key challenge with ERM is the risk of overfitting, where the model learns the training data too well and fails to generalize. To combat this, ERM is often used in conjunction with regularization techniques, which is an application of a related principle called Structural Risk Minimization (SRM).

Related Terms