Knowledge Discovery in Databases 1991

D. Jensen, "Knowledge Discovery Through Induction with Randomization Testing," Proceedings of the 1991 Knowledge Discovery in Databases Workshop, G. Piatetsky-Shapiro (Ed.), 148-159. Menlo Park, CA: AAAI, 1991.
Abstract
This paper describes an approach that combines elements of both machine learning and statistics. The approach -- Induction with Randomization Testing (IRT) -- provides an environment for discovery of new knowledge through flexible interaction with data and models. The approach allows investigators to use their own skills where they are strong and provides automated assistance where those skills are weak. Of particular importance is how IRT tests models. The approach estimates the probability that apparent improvement in the accuracy of a given model is due to chance alone. These estimates, provided by randomization testing, protect against constructing models of inappropriate complexity.
Text
A PostScript version of the paper is available (120K). The PostScript version does not contain a series of screenshots that appear in the printed version of the paper, but is otherwise complete.
Figures
The figures missing from the PostScript version of the paper are available:
Citations
This article is cited in:
Gregory Piatetsky-Shapiro, Christopher Matheus, Padhraic Smyth, and Ramasamy Uthurusamy, "KDD-93: Progress and Challenges in Knowledge Discovery in Databases," AI Magazine, Fall 1994.


Feedback Back to main page Fineprint