Adjusting for Multiple Testing in Decision Tree Pruning
D. Jensen, "Adjusting for Multiple Testing in Decision Tree Pruning."
Preliminary Papers of the Sixth International Workshop on Artificial
Intelligence and Statistics. January 1997. pp. 295-302.
- Abstract
- Overfitting is a widely observed pathology of induction algorithms.
For induction algorithms that build decision trees, pruning is
a common approach to correct overfitting. Most common pruning
techniques, do not account for one potentially important factor
multiple comparisons. Multiple comparisons occur whenever an
induction algorithm examines several candidate models and selects
the one that best accords with the data. Making multiple comparisons
produces systematic overestimates of accuracy. This paper empirically
examines the importance of accounting for multiple comparisons
when evaluating models. Specifically, it examines the effectiveness
of one particular pruning method that does account for multiple
comparisons -- Bonferroni pruning. Based on experiments with artificial
and realistic datasets, Bonferroni pruning produces trees that
are smaller and at least as accurate as trees pruned using several
other common approaches.
- Text
- A Postscript version of this paper is available on request.