Gini Impurity

What is Gini Impurity?

Gini Impurity is a measurement of the likelihood that a new, randomly chosen element will be incorrectly classified. It's a key metric used in decision tree algorithms to determine the optimal split for a node. A lower Gini Impurity value indicates a better split, with 0 representing a pure node (all elements belong to a single class).

Where did the term "Gini Impurity" come from?

The concept is derived from the Gini coefficient developed by Italian statistician Corrado Gini in 1912. It was later adapted for machine learning in the context of Classification and Regression Trees (CART) algorithms by Breiman, Friedman, Olshen, and Stone in 1984.

How is "Gini Impurity" used today?

Gini Impurity is a widely used splitting criterion in decision tree based models like Random Forests and Gradient Boosting. It is often preferred over entropy or information gain because it is computationally faster, as it does not require logarithmic calculations.

Related Terms