Adjusting for Multiple Testing in Decision Tree Pruning

D. Jensen, "Adjusting for Multiple Testing in Decision Tree Pruning." Preliminary Papers of the Sixth International Workshop on Artificial Intelligence and Statistics. January 1997. pp. 295-302.

Abstract
Overfitting is a widely observed pathology of induction algorithms. For induction algorithms that build decision trees, pruning is a common approach to correct overfitting. Most common pruning techniques, do not account for one potentially important factor – multiple comparisons. Multiple comparisons occur whenever an induction algorithm examines several candidate models and selects the one that best accords with the data. Making multiple comparisons produces systematic overestimates of accuracy. This paper empirically examines the importance of accounting for multiple comparisons when evaluating models. Specifically, it examines the effectiveness of one particular pruning method that does account for multiple comparisons -- Bonferroni pruning. Based on experiments with artificial and realistic datasets, Bonferroni pruning produces trees that are smaller and at least as accurate as trees pruned using several other common approaches.
Text
A Postscript version of this paper is available on request.


Feedback Back to main page Fineprint