Importing XML Data

This section describes how to import XML data into Proximity using the provided ProxWebKB database as an example. Proximity lets you import a complete database, including any subgraphs and containers, or you can import individual attributes or containers.

By default, Proximity restricts the structure of the XML data file to ensure that you cannot accidentally create identity conflicts. For example, Proximity prohibits adding additional objects to a database once the initial set of objects has been defined; therefore, the XML data file can contain only a single objects section and you cannot add more objects with a subsequent import. Similarly, you cannot add new links or add more values to an attribute once you have completed importing the initial set of links or attribute values. You can override these restrictions by using the noChecks option to the import script, described in “Importing XML data using noChecks.

[Caution]

You are responsible for ensuring the integrity of the data in an XML file. Proximity makes no checks to ensure that attributes are assigned to items (objects, links, subgraphs, or containers) that are actually present in the database. Assigning attribute values to non-existent items does not trigger an exception or warning.

The sections below walk through the process of first importing a database and then importing values for a new attribute on the existing database objects.

Importing databases using XML

The exercise below walks through the process of importing the XML version of the ProxWebKB database. ProxWebKB was developed from the WebKB relational data set [Craven et al., 1999] available from www-2.cs.cmu.edu/~WebKB/. The version used for the Proximity tutorial has been modified from the public distribution to meet the needs of this tutorial. Modifications include some data clean up and the the creation of additional object attributes based on the data in the distributed version.

Exercise 3.1. Importing the ProxWebKB data into Proximity:

  1. Uncompress the compressed ProxWebKB XML data file.

    > cd $PROX_HOME/doc/user/tutorial/examples
    > gunzip proxwebkb.xml.gz
    
  2. Copy the file prox3db.dtd from $PROX_HOME/resources to the directory containing the XML data file, $PROX_HOME/doc/user/tutorial/examples.

    > cp $PROX_HOME/resources/prox3db.dtd $PROX_HOME/doc/user/tutorial/examples/
    
  3. Start the MonetDB server. Data files must be on the same machine as that serving the database.

    > Mserver --dbname ProxWebKB $PROX_HOME/resources/init-mserver.mil
    

    The init-mserver.mil script sets the port for the server to 30000. To use a different port, add --set port=nnnnn (where nnnnn is the new port number) to the command line. For example:

    > Mserver --dbname ProxWebKB $PROX_HOME/resources/init-mserver.mil \
      --set port=45678
    

    Remember to use a port number > 40000 if you are using MonetDB 4.6.2. See “Running the MonetDB database server” for more information on starting and using the MonetDB server.

    Because the database does not exist, MonetDB prints warning statements along with its usual startup message:

    !WARNING: GDKlockHome: created directory
        /usr/local/Monet-mars/var/MonetDB4/dbfarm/ProxWebKB/
    !WARNING: GDKlockHome: ignoring empty or invalid .gdk_lock.
    !WARNING: BBPdir: initializing BBP.
    # MonetDB Server v4.20.0
    # based on GDK   v1.20.0
    # Copyright (c) 1993-2007, CWI. All rights reserved.
    # Compiled for powerpc-apple-darwin8.10.0/32bit with 32bit OIDs; dynamically linked.
    # Visit http://monetdb.cwi.nl/ for further information.
    Listening on port 30000
    MonetDB>
    

    The startup message may be slightly different depending on your operating system and the version of MonetDB you are using.

    MonetDB also creates a ProxWebKB directory in its dbfarm directory to hold the new database.

    Leave the MonetDB server running for the remainder of the import process. You must be serving the database for any Proximity action that interacts with database data.

  4. Initialize the new Proximity database. (Substitute the appropriate port number if you are using a different port.)

    > cd $PROX_HOME
    > bin/db-util.sh localhost:30000 init-db
    

    Proximity outputs the following trace (leading information showing elapsed time and execution thread has been omitted from the trace for brevity):

    INFO  app.DBUtil: * connecting to db
    INFO  app.DBUtil: * database opened; initializing Prox tables
    INFO  db.DB: * initializing Proximity database
    INFO  app.DBUtil: * tables initialized
    INFO  app.DBUtil: * disconnecting from db
    INFO  app.DBUtil: * done
    
  5. Import the XML data file into the new Proximity database. (Substitute the appropriate port number if you are using a different port.)

    > bin/import-xml.sh localhost:30000 \
      $PROX_HOME/doc/user/tutorial/examples/proxwebkb.xml
    

    When the import process is finished, Proximity reports on the number of database entities created.

    INFO  app.ImportXMLApp: * done importing; counts:
      4135 objects, 10934 links, 13 attributes, 222052 attribute values,
      0 containers, 0 subgraph items