Why Stacked Models Perform Effective Collective Classification

Fast, A. and D. Jensen (2008). Why Stacked Models Perform Effective Collective Classification. To appear in Proceedings of IEEE International Conference on Data Mining. (Also appears as University of Massachusetts Amherst, Technical Report 08-47.)

Abstract

Collective classification techniques jointly infer all class labels of a relational data set, using the inferences about one class label to influence inferences about related class labels. Typical collective classification schemes use computationally-intensive iterative algorithms or approximate joint inference techniques. Kou and Cohen recently introduced an efficient relational model based on stacking that, despite its simplicity, performs equivalently to more sophisticated joint inference approaches. This stacked relational model trains on the inferred labels of related instances, instead of the true labels which are not typically present at inference time. This permits the use of efficient exact inference in place of more computationally-intensive approximate joint inference. There are at least two possible causes for the unexpected high performance of the stacked approach: a reduction in inference bias (resulting from training on inferred rather than true labels) or a reduction in inference variance (due to the use of exactrather than approximate inference). Using experiments on both real and synthetic data, we show that the primary cause for the performance of the stacked model is the reduction in bias from learning the stacked model on inferred labels rather than the true labels. The reduction in variance due to conditional inference also contributes to the e_ect but it is not as strong. In addition, we show that the performance of the joint inference and stacked learners can be attributed to an implicit weighting of local and relational features at learning time.

Text
A PDF version of this paper is available.

Feedback Back to main page Fineprint