| Home
People
Publications
Projects
Software
PROXIMITY
What's New
About PROXIMITY
FAQ
Downloads
Documentation
License
Acknowledgments
Mailing lists
Contact
Data
News |
PROXIMITY is an open-source system for relational
knowledge discovery designed and implemented by the Knowledge Discovery Laboratory
in the Department of Computer
Science at the University of Massachusetts
Amherst.
Most real-world data sets are too large and complex for
humans to analyze and understand on their own. Knowledge
discovery methods are designed to fill in this gap and
enable “sense-making” of large data sets.
Relational knowledge discovery focuses on analyzing and
sense-making in large relational data sets, that is,
data where entities such as people, places, things, and
events are represented as separate objects and the
relationships between these entities, such as “Robert
Duvall acted in The Handmaid's Tale,” are also
explicitly represented. The PROXIMITY
system allows users to easily understand and modify large
relational data sets. In addition, users can build
statistical relational models to enhance their
understanding of the data and to make predictions about new
data.
PROXIMITY incorporates major research
findings from the Knowledge
Discovery Laboratory, including model corrections for
statistical biases inherent in relational data such as
autocorrelation and degree disparity, as well as our graphical
query language. PROXIMITY provides an
open-source platform that can be used for both research into
relational knowledge discovery and practical applications to
real-world data.
PROXIMITY 4.3 Features
- High Performance
PROXIMITY uses the
MonetDB server, a fast,
open-source vertical database. MonetDB allows
PROXIMITY to be orders of magnitude faster than
systems hosted on SQL databases for the kinds of operations needed
by relational knowledge discovery.
- QGraph
PROXIMITY's graphical query
language (QGraph) computes fast matches to high-level descriptions
of relational data patterns [Blau,
Immerman, and Jensen, 2002]. A graphical editor supports
interactive creation of QGraph queries.
- Automatic Construction of Statistical Models
PROXIMITY allows a user to easily construct
statistical relational models from either the Java API or from
Python scripts. The models are trained using sets of labeled
subgraphs that can be created from QGraph queries. Using either
interface, models can also be applied to new (unlabeled) data.
Instead of only providing the most likely label, our models
specify a probability distribution over the possible labels for
each subgraph. PROXIMITY allows the user to
evaluate the performance of the models using both accuracy and
receiver-operator curves (ROC). These models can be saved and
reloaded for later use.
- Relational Bayesian Classifiers
A relational
Bayesian classifier (RBC) is a relational version of the simple
Bayesian classifier [Neville,
Jensen, and Gallagher, 2003]. This classifier builds a
probabilistic model of each attribute based on the attributes of
surrounding objects and links. Although the RBC is a simple
model, it performs quite well.
- Relational Probability Trees
A relational
probability tree (RPT) selectively considers attributes of
nearby objects and links as well as complex aggregates of these
attributes to build a probabilistic model [Neville,
Jensen, Friedland, and Hay, 2003, Jensen
& Neville 2002; Jensen,
Neville, & Hay 2003].
- Relational Dependency Networks
Relational
dependency networks (RDNs) extend dependency networks to a
relational setting [Neville
and Jensen, 2004]. RDN models are a new form of
probabilistic relational models that offer advantages over
relational Bayesian networks (RBNs) and relational Markov
networks (RMNs). Advantages of RDN models include an
interpretable representation that facilitates knowledge
discovery in relational data; the ability to represent
arbitrary cyclic dependencies, including relational
autocorrelation; and simple and efficient methods for learning
both model structure and parameters.
- Browser-Style Interface
PROXIMITY provides users with an intuitive
browser-style user interface, and with powerful database
visualization tools.
- XML and Text Import
PROXIMITY supports simple, but flexible, XML
and text formats for importing data from earlier versions
of PROXIMITY, or from other databases or
applications.
- Python-Based Scripting
All
PROXIMITYoperations that can be called
directly from our Java API can also be invoked by Python
scripts or called from the GUI via our interactive
interpreter.
- Open Source
All of
PROXIMITY's source code (written in Java) is
included in the distribution.
- Documentation
The
PROXIMITY distribution includes written
documentation and examples.
Platforms
PROXIMITY 4.3 is implemented in Java, making
it platform-independent. In addition to source files, we
provide ready-to-run installation files for the following
platforms:
| |
PROXIMITY Client |
|
MonetDB Server |
| Linux i86 (glibc 2.3 or higher) |
 |
|
 |
| Linux i86 (glibc 2.3 or higher)/64 |
 |
|
 |
| Mac OS X (10.2 or higher) |
 |
|
 |
| Mac OS X/64 (10.2 or higher) |
 |
|
 |
| Mac OS X/Intel |
 |
|
 |
| Windows 2000/XP/Vista |
 |
|
 |
| Windows 2000/XP/Vista 64-bits |
 |
|
 |
PROXIMITY 4.3 requires Java J2SE
1.5 or later. |