Statistical Challenges to Inductive Inference in Linked Data

D. Jensen (1998). "Statistical Challenges to Inductive Inference in Linked Data." Preliminary papers of the 7th International Workshop on Artificial Intelligence and Statistics.

Abstract
Many data sets can be represented naturally as collections of linked objects. For example, document collections can be represented as documents (nodes) connected by citations and hypertext references (links). One important class of techniques for analyzing linked data involves inductive inference. For example, researchers in information retrieval might construct models to predict whether a particular WWW document is a homepage based on features of other documents to which it is connected (e.g., pages listing publications or family photos). However, relatively little work examines the unique statistical challenges of inductive inference in linked data. This paper examines three such challenges: 1) statistical dependence caused by linked instances; 2) bias introduced by sampling density; and 3) multiple comparisons intensified by feature combinatorics.


Feedback Back to main page Fineprint