Proximity Software

Knowledge Discovery Laboratory Knowledge Discovery Laboratory
PROXIMITY
Home

People

Publications

Projects

Software

PROXIMITY

What's New
About PROXIMITY
FAQ
Downloads
Documentation
License
Acknowledgments
Mailing lists
Contact

Data

News

PROXIMITY is an open-source system for relational knowledge discovery designed and implemented by the Knowledge Discovery Laboratory in the Department of Computer Science at the University of Massachusetts Amherst.

Proximity

Most real-world data sets are too large and complex for humans to analyze and understand on their own. Knowledge discovery methods are designed to fill in this gap and enable “sense-making” of large data sets. Relational knowledge discovery focuses on analyzing and sense-making in large relational data sets, that is, data where entities such as people, places, things, and events are represented as separate objects and the relationships between these entities, such as “Robert Duvall acted in The Handmaid's Tale,” are also explicitly represented. The PROXIMITY system allows users to easily understand and modify large relational data sets. In addition, users can build statistical relational models to enhance their understanding of the data and to make predictions about new data.

PROXIMITY incorporates major research findings from the Knowledge Discovery Laboratory, including model corrections for statistical biases inherent in relational data such as autocorrelation and degree disparity, as well as our graphical query language. PROXIMITY provides an open-source platform that can be used for both research into relational knowledge discovery and practical applications to real-world data.

PROXIMITY 4.3 Features

  • High Performance
    PROXIMITY uses the MonetDB server, a fast, open-source vertical database. MonetDB allows PROXIMITY to be orders of magnitude faster than systems hosted on SQL databases for the kinds of operations needed by relational knowledge discovery.
     
  • QGraph
    PROXIMITY's graphical query language (QGraph) computes fast matches to high-level descriptions of relational data patterns [Blau, Immerman, and Jensen, 2002]. A graphical editor supports interactive creation of QGraph queries.
     
  • Automatic Construction of Statistical Models
    PROXIMITY allows a user to easily construct statistical relational models from either the Java API or from Python scripts. The models are trained using sets of labeled subgraphs that can be created from QGraph queries. Using either interface, models can also be applied to new (unlabeled) data. Instead of only providing the most likely label, our models specify a probability distribution over the possible labels for each subgraph. PROXIMITY allows the user to evaluate the performance of the models using both accuracy and receiver-operator curves (ROC). These models can be saved and reloaded for later use.
     
    • Relational Bayesian Classifiers
      A relational Bayesian classifier (RBC) is a relational version of the simple Bayesian classifier [Neville, Jensen, and Gallagher, 2003]. This classifier builds a probabilistic model of each attribute based on the attributes of surrounding objects and links. Although the RBC is a simple model, it performs quite well.
       
    • Relational Probability Trees
      A relational probability tree (RPT) selectively considers attributes of nearby objects and links as well as complex aggregates of these attributes to build a probabilistic model [Neville, Jensen, Friedland, and Hay, 2003, Jensen & Neville 2002; Jensen, Neville, & Hay 2003]. Proximity 4.3
       
    • Relational Dependency Networks
      Relational dependency networks (RDNs) extend dependency networks to a relational setting [Neville and Jensen, 2004]. RDN models are a new form of probabilistic relational models that offer advantages over relational Bayesian networks (RBNs) and relational Markov networks (RMNs). Advantages of RDN models include an interpretable representation that facilitates knowledge discovery in relational data; the ability to represent arbitrary cyclic dependencies, including relational autocorrelation; and simple and efficient methods for learning both model structure and parameters.
       

     

     
    Proximity 4.3
  • Browser-Style Interface
    PROXIMITY provides users with an intuitive browser-style user interface, and with powerful database visualization tools.
     
  • XML and Text Import
    PROXIMITY supports simple, but flexible, XML and text formats for importing data from earlier versions of PROXIMITY, or from other databases or applications.
     
  • Python-Based Scripting
    All PROXIMITYoperations that can be called directly from our Java API can also be invoked by Python scripts or called from the GUI via our interactive interpreter.
     
  • Open Source
    All of PROXIMITY's source code (written in Java) is included in the distribution.
     
  • Documentation
    The PROXIMITY distribution includes written documentation and examples.
     

Platforms

PROXIMITY 4.3 is implemented in Java, making it platform-independent. In addition to source files, we provide ready-to-run installation files for the following platforms:

  PROXIMITY Client   MonetDB Server
Linux i86 (glibc 2.3 or higher) X   X
Linux i86 (glibc 2.3 or higher)/64 X   X
Mac OS X (10.2 or higher) X   X
Mac OS X/64 (10.2 or higher) X   X
Mac OS X/Intel X   X
Windows 2000/XP/Vista X      X
Windows 2000/XP/Vista 64-bits X      X

PROXIMITY 4.3 requires Java J2SE 1.5 or later.

FeedbackPrivacyDisclaimer