Dissertation
D. Jensen, Induction with Randomization Testing: Decision-Oriented
Analysis of Large Data Sets. Department of Engineering and Policy,
Washington University, St. Louis Missouri, Doctoral Dissertation,
1992.
- Abstract
- Induction systems are computer-based tools that aid the construction
of useful models from data. Existing systems are subject to overfitting
-- a tendency to produce models with unnecessary structure. Accurate
statistical significance testing could prevent overfitting, but
nearly all existing statistical significance tests are not appropriate
for induction systems. One approach, randomization testing, can
be extended to meet the challenges posed by induction systems.
Experiments indicate that a system with randomization testing
can successfully combat overfitting. Models produced by the system
are as accurate as, but significantly simpler than, models produced
by other systems.
- Text
- A complete PostScript version is available (500K).
- Citations
- This dissertation is cited in:
- Cullen Schaffer, "Overfitting Avoidance as Bias," Machine Learning 10:153-178, 1993.
- Links
- Washington University.
- Committee
- Dr. William P. Darby, Department of Engineering and Policy (Advisor)
Dr. Willam Ball, Department of Computer Science
Dr. Robert P. Morgan, Department of Engineering and Policy
Dr. Lee Robins, Washington University Medical School
Dr. Edward Spitznagel, Department of Mathematics and Statistics