Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning

D. Jensen and J. Neville (2002). Linkage and autocorrelation cause feature selection bias in relational learning. Proceedings of the Nineteenth International Conference on Machine Learning (ICML2002). Morgan Kaufmann. pp. 259-266.

Abstract
Two common characteristics of relational data sets — concentrated linkage and relational auto-correlation — can cause learning algorithms to be strongly biased toward certain features, irrespective of their predictive power. We identify these characteristics, define quantitative measures of their severity, and explain how they produce this bias. We show how linkage and auto-correlation affect a representative algorithm for feature selection by applying the algorithm to synthetic data and to data drawn from the Internet Movie Database.
Text
A PDF version of this paper is available.


Feedback Back to main page Fineprint