Dissertation

D. Jensen, Induction with Randomization Testing: Decision-Oriented Analysis of Large Data Sets. Department of Engineering and Policy, Washington University, St. Louis Missouri, Doctoral Dissertation, 1992.
Abstract
Induction systems are computer-based tools that aid the construction of useful models from data. Existing systems are subject to overfitting -- a tendency to produce models with unnecessary structure. Accurate statistical significance testing could prevent overfitting, but nearly all existing statistical significance tests are not appropriate for induction systems. One approach, randomization testing, can be extended to meet the challenges posed by induction systems. Experiments indicate that a system with randomization testing can successfully combat overfitting. Models produced by the system are as accurate as, but significantly simpler than, models produced by other systems.

Text
A complete PostScript version is available (500K).

Citations
This dissertation is cited in:
Cullen Schaffer, "Overfitting Avoidance as Bias," Machine Learning 10:153-178, 1993.

Links
Washington University.

Committee
Dr. William P. Darby, Department of Engineering and Policy (Advisor)
Dr. Willam Ball, Department of Computer Science
Dr. Robert P. Morgan, Department of Engineering and Policy
Dr. Lee Robins, Washington University Medical School
Dr. Edward Spitznagel, Department of Mathematics and Statistics


Feedback Back to main page Fineprint