Proximity 4.3 Tutorial

Proximity 4.3 Tutorial

The Proximity Tutorial, including source files and examples, is part of the open-source Proximity system. See the LICENSE file for copyright and license information.

All trademarks or registered trademarks are the property of their respective owners.

This effort is or has been supported by AFRL, DARPA, NSF, and LLNL/DOE under contract numbers F30602-00-2-0597, F30602-01-2-0566, HR0011-04-1-0013, EIA9983215, and W7405-ENG-48 and by the National Association of Securities Dealers (NASD) through a research grant with the Univeristy of Massachusetts. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation hereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements either expressed or implied, of AFRL, DARPA, NSF, LLNL/DOE, NASD, the University of Massachusetts Amherst, or the U.S. Government.

The example database used to support the exercises in this tutorial, ProxWebKB, was developed from the publicly available WebKB relational data set developed by the Text Learning Group at Carnegie-Mellon University. The version used for the Proximity tutorial has been modified from the original distribution to meet the needs of this tutorial. The original dataset is available from

General inquiries regarding Proximity should be directed to:

Knowledge Discovery Laboratory
c/o Professor David Jensen, Director
Department of Computer Science
University of Massachusetts

AmherstMassachusetts 01003-9264

November 15, 2007

Table of Contents

1. Introduction
Conventional Knowledge Discovery
Relational Knowledge Discovery
Proximity Advantages
2. Getting Started with Proximity
Using the Tutorial
Using Proximity
Contact information
Tips and Reminders
3. Importing and Exporting Proximity Data
Importing XML Data
Transforming Tabular Data to XML
Exporting Data to XML
Importing Plain Text Data
Exporting Plain Text Data
Specialized Data Export
Deleting Proximity Databases
Tips and Reminders
4. Exploring Data
The Proximity User Interface
Exploring Objects and Links
Exploring Attributes
Using the Location Bar
Visualizing Data
Setting Display Preferences
Analyzing the Database Schema
Tips and Reminders
5. Querying the Database
A First Proximity Query
Exploring Containers and Subgraphs
Grouping Elements in a Query
Comparing Items in a Query
Matching Complex Subgraphs with Subqueries
Adding Links to Data with Queries
Executing a Query from the Proximity Database Browser
Executing a Query from the Command Line
Querying Containers
Tips and Reminders
6. Using Scripts
Working with Scripts
Running Proximity Scripts
Using the Proximity Python Interpreter
Sampling the Database
Adding a New Attribute
Social Networking Algorithms
Working with Proximity Tables
Synthetic Data Generation
Tips and Reminders
7. Learning Models
The Modeling Process in Proximity
Relational Bayesian Classifier
Relational Probability Trees
Relational Dependency Networks
Tips and Reminders
A. Proximity Quick Reference
MonetDB Server
Proximity Shell Scripts and Batch Files
Query Editor Keyboard Shortcuts
Proximity Python Interpreter Commands
Location Bar Path Syntax
DTD Files
Technical Support and Documentation
B. Installation
Obtaining Proximity
Installing MonetDB
Installing Proximity
Updating MonetDB Databases
C. Proximity XML Format
The PROX3DB root element
D. Proximity Text Data Format
File Formats

List of Exercises

3.1. Importing the ProxWebKB data into Proximity
3.2. Importing attribute values using XML
3.3. Importing additional link_tag attribute values
3.4. Exporting a database to XML
3.5. Exporting an attribute to XML
3.6. Importing a database using plain text data
3.7. Importing an attribute using plain text data
3.8. Exporting a database to plain text
4.1. Exploring objects and links
4.2. Exploring attributes
4.3. Using the location bar
4.4. Exploring data with the graphical data browser
4.5. Customizing object and link labels
4.6. Exploring the database schema
5.1. Creating a first Proximity query
5.2. Exploring containers and subgraphs
5.3. Creating a query with numeric annotations
5.4. Adding constraints to a query
5.5. Using subqueries in a query
5.6. Adding links with a query
5.7. Executing a saved query from the Proximity Database Browser
5.8. Executing a query from the command line
5.9. Querying a container from the Proximity Database Browser
6.1. Running a script from the Proximity Database Browser
6.2. Running a script from the command line
6.3. Running a script interactively
6.4. Creating training and test sets
6.5. Adding a new attribute
6.6. Running social networking algorithms
6.7. Finding specific links
6.8. Generating synthetic i.i.d. data
7.1. Learning and applying the relational Bayesian classifier model
7.2. Learning and applying the relational probability tree model
7.3. Viewing relational probability trees
7.4. Learning and applying the relational dependency network model
7.5. Viewing relational dependency network graphs