Adjusting for Multiple Comparisons in Decision Tree Pruning
David Jensen and Matt Schmill, "Adjusting for Multiple Comparisons
in Decision Tree Pruning." To appear in Proceedings of the Third
International Conference on Knowledge Discovery and Data Mining.
August 1997.
- Abstract
- Pruning is a common technique to avoid overfitting in decision
trees. Most pruning techniques do not account for one important
factor -- multiple comparisons. Multiple comparisons occur when
an induction algorithm examines several candidate models and selects
the one that best accords with the data. Making multiple comparisons
produces incorrect inferences about model accuracy. We examine
a method that adjusts for multiple comparisons when pruning decision
trees -- Bonferroni pruning. In experiments with artificial and
realistic datasets, Bonferroni pruning produces smaller trees
that are at least as accurate as trees pruned using other common
approaches.
- Text
- A Postscript version of this paper is available (220K). A longer and earlier version
of this paper was presented at AI & Statistics 1997.