Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners
D. Jensen and J. Neville (2002). Autocorrelation and linkage cause bias in evaluation of relational learners. Proceedings of The Twelfth International Conference on Inductive Logic Programming (ILP 2002). Springer-Verlag.
- Abstract
- Two common characteristics of relational data sets concentrated linkage and relational auto-correlation can cause traditional methods of evaluation to greatly overestimate the accuracy of induced models on test sets. We identify these characteristics, define quantitative measures of their severity, and explain how they produce this bias. We show how linkage and autocorrelation affect estimates of model accuracy by applying FOIL to synthetic data and to data drawn from the Internet Movie Database. We show how a modified sampling procedure can eliminate the bias.
- Text
- A PDF version of this paper is available.