Overfitting Explained
P. R. Cohen and D. Jensen, "Overfitting Explained." Preliminary
Papers of the Sixth International Workshop on Artificial Intelligence
and Statistics. January 1997. pp. 115-122.
- Abstract
- Overfitting arises when model components are evaluated against
the wrong reference distribution. Most modeling algorithms iteratively
find the best of several components and then test whether this
component is good enough to add to the model. We show that for
independently distributed random variables, the reference distribution
for any one variable underestimates the reference distribution
for the highest-valued variable; thus variate values will appear
significant when they are not, and model components will be added
when they should not be added. We relate this problem to the well-known
statistical theory of multiple comparisons or simultaneous inference.
- Text
- A Postscript version of this paper is available (153K).