Lise Getoor is an assistant professor at the University of Maryland, College Park. She and David Jensen are co-organizers of this workshop.
I've worked with
Daphne Koller,
Nir Friedman,
Avi Pfeffer
and
Ben Taskar
on the problem of
learning Probabilistic Relational Models (PRMs). Avi introduced the PRM model in his thesis
and studied how to do efficient inference in these first order probabilistic models. In my
thesis, I focused on learning the models. And this in turn lead to some interesting extensions to
the framework.
The basic PRM framework extends Bayesian networks to relational domains. The language supports the concepts of objects, object attributes and relations. The PRM defines a template for a distribution, the dependencies are described for attributes at the class level -- attributes can depend on other attributes of the same object, or they can depend on attributes of some related objects. In the PRM framework the definition of related objects is done using entity-relation jargon, where we have reference slots and we chain together reference slots to reach attributes of related objects. But this can just as easily be done using a set of rules, as done in Bayesian Logic Programs. In the case where a set of objects is returned, rather than a single object, we use aggregate functions to construct a single feature that characterizes the set.
This template only tells part of the story. The semantics for the basic PRMs require that we are given a relational skeleton, in addition to the template. The relational skeleton defines our universe of discourse; it gives us the necessary information to define the random variables in our distribution. The relation skeleton tells us for each class, the set of objects in that class, and the set of relationships that exist between the objects. If the PRM dependency template is describing a genetic domain, and that the attributes of an person depends on inherited characteristics from her mother and father, then the relational skeleton is the part of the model that defines a particular family tree, with individuals and mother/father relationships between them.
Once we have the template together with the relational skeleton, we now have a well-defined probability distribution. One way to think of it is that we can construct an unrolled Bayesian network that has a node for
each attribute of each object in the relational skeleton, and the parents for that attribute are instantiated
according to the relational skeleton, and the parameters of the BN are tied so that they match the PRM template. This basic framework was described in our IJCAI 1999
paper
and a
book chapterin Saso Dzeroski
and Nada Lavrac's Relational Data Mining book.
We've also looked at incorporating uncertainty over the relational skeleton into the probabilistic
model. One interesting finding is that even in the case where the relational structure is observed, by incorporating this into our probabilistic model, we can improve predictive accuracy of the model. We examined this in an ICML 2001
paper
and more fully in a JMLR 2002
paper.
Another interesting aspect that we have looked at is the continuum from instance-based models
to class-based models. We had an early workshop paper on this, but it is described more fully in
my
thesis. An interesting application of this is to collaborative filtering and recommender systems.
We've also been investigating something that we call a Statistical Relational Model (SRM). Rather than describing a distribution over some collection of individuals defined by a relational skeleton, SRMs are used to capture frequency-based information, i.e. the probabability that a randomly chosen individual displays certain characteristics. It turns out this type of model has some interesting applications, for example they are great for capturing a compact model of statistics for a relational database. This may be used as input to some other data mining algorithm, or it can be used directly, for example to do DB query selectivity estimation. This application is described in our SIGMOD 2001 paper.
Since I've come to UMD, I've been looking at other structured statistical models. With Edward Hung and
V. S. Subrahmanian, we've developed a probabilistic model for semi-structured data, PXML. Of particular
interest for this workshop is work on link-based classification that I have done with my student Qing Lu.
A preliminary version appears in the IJCAI 2003 workshop on Text and Link Mining. The final version appears in ICML 2003
paper.