Statistical Challenges to Inductive Inference in Linked Data
D. Jensen (1998). "Statistical Challenges to Inductive Inference in Linked Data." Preliminary papers of the 7th International Workshop on Artificial Intelligence and Statistics.
- Abstract
- Many data sets can be represented naturally as collections of
linked objects. For example, document collections can be represented
as documents (nodes) connected by citations and hypertext references
(links). One important class of techniques for analyzing linked
data involves inductive inference. For example, researchers in
information retrieval might construct models to predict whether
a particular WWW document is a homepage based on features of other
documents to which it is connected (e.g., pages listing publications
or family photos). However, relatively little work examines the
unique statistical challenges of inductive inference in linked
data. This paper examines three such challenges: 1) statistical
dependence caused by linked instances; 2) bias introduced by sampling
density; and 3) multiple comparisons intensified by feature combinatorics.