Adjusting for Multiple Comparisons in Decision Tree Pruning

David Jensen and Matt Schmill, "Adjusting for Multiple Comparisons in Decision Tree Pruning." To appear in Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. August 1997.

Abstract
Pruning is a common technique to avoid overfitting in decision trees. Most pruning techniques do not account for one important factor -- multiple comparisons. Multiple comparisons occur when an induction algorithm examines several candidate models and selects the one that best accords with the data. Making multiple comparisons produces incorrect inferences about model accuracy. We examine a method that adjusts for multiple comparisons when pruning decision trees -- Bonferroni pruning. In experiments with artificial and realistic datasets, Bonferroni pruning produces smaller trees that are at least as accurate as trees pruned using other common approaches.
Text
A Postscript version of this paper is available (220K). A longer and earlier version of this paper was presented at AI & Statistics 1997.

Feedback Back to main page Fineprint