Knowledge Discovery in Databases 1991
D. Jensen, "Knowledge Discovery Through Induction with Randomization
Testing," Proceedings of the 1991 Knowledge Discovery in Databases Workshop, G. Piatetsky-Shapiro (Ed.), 148-159. Menlo Park, CA: AAAI, 1991.
- Abstract
- This paper describes an approach that combines elements of both
machine learning and statistics. The approach -- Induction with
Randomization Testing (IRT) -- provides an environment for discovery
of new knowledge through flexible interaction with data and models.
The approach allows investigators to use their own skills where
they are strong and provides automated assistance where those
skills are weak. Of particular importance is how IRT tests models.
The approach estimates the probability that apparent improvement
in the accuracy of a given model is due to chance alone. These
estimates, provided by randomization testing, protect against
constructing models of inappropriate complexity.
- Text
- A PostScript version of the paper is available (120K). The PostScript version does
not contain a series of screenshots that appear in the printed
version of the paper, but is otherwise complete.
- Figures
- The figures missing from the PostScript version of the paper are
available:
- Citations
- This article is cited in:
Gregory Piatetsky-Shapiro, Christopher Matheus, Padhraic Smyth,
and Ramasamy Uthurusamy, "KDD-93: Progress and Challenges in Knowledge
Discovery in Databases," AI Magazine, Fall 1994.