Feature Selection Biases in Relational Learning

D. Jensen and J. Neville (2002). Feature selection biases in relational learning. Invited talk to the Machine Intelligence 19 Workshop. Withersdane Conference Centre, Imperial College at Wye. September 18-20.

Abstract
Recently, we showed how two common structural characteristics of relational data can cause learning algorithms to mistakenly conclude that correlation exists between a class label and a relational feature (Jensen & Neville 2002). This paper extends that work to three additional cases where structural characteristics of relational data can cause an inappropriate bias toward certain classes of features. Collectively, these cases can affect any learning algorithm that uses aggregation functions (e.g., maximum, minimum, average, mode, count, or exists) to construct relational features. Such algorithms include probabilistic relational models and many algorithms for learning models in first-order logic. We provide proofs of these biases and discuss their effect on algorithms for learning probabilistic models from relational data. We also summarize and discuss all our recent work on the feature selection biases introduced by the relational structure of training data.

Feedback Back to main page Fineprint