Multiple Comparisons in Induction Algorithms
David Jensen and Paul R. Cohen (2000). "Multiple Comparisons in Induction
Algorithms." Machine Learning 38: 309-338.
- Abstract
- A single mechanism is responsible for three pathologies of induction
algorithms: attribute selection errors, overfitting, and oversearching.
In each pathology, induction algorithms compare multiple items based on
scores from an evaluation function and select the item with the
maximum score. We call this a multiple comparison procedure (MCP). We
analyze the statistical properties of MCPs and show how failure to
adjust for these properties leads to the pathologies. We also discuss
approaches that can control pathological behavior, including Bonferroni
adjustment, randomization testing, and cross-validation.
- Text
- Postscript (1.6M) and PDF (500K) versions of this paper are available.