Importing Plain Text Data

This section describes how to import plain text data into Proximity. Proximity lets you import a complete database, including any subgraphs and containers, or you can import individual attributes or containers.

[Caution]

The utilities that use the Proximity plain text data format perform no error checking. Proximity makes no checks to ensure that attributes are assigned to items (objects, links, subgraphs, or containers) that are actually present in the database. Assigning attribute values to non-existent items does not trigger an exception or warning. No checks are made to ensure that existing data is not being incorrectly overwritten. Although this format requires less disc space than the XML format and its use can improve import and export speed, you are solely responsible for maintaining data integrity and consistency when using this format.

Unlike when using the XML data format, Proximity does not prohibit adding additional elements to a database once those elements have been defined when importing data using the plain text data format. Therefore, there is no need for a equivalent to the noChecks option available for importing XML data. When using the plain text format, you must take care to ensure the integrity and consistency of your data; Proximity will not necessarily alert you to data errors when using this format.

Importing databases using plain text

The exercise below walks through the process of importing a sample database using the plain text data format. This database stores selected data for a small set of movies, actors, and directors. The set of files for this exercise illustrate how the plain text data format represents all types of Proximity database entities.

Exercise 3.6. Importing a database using plain text data:

This exercise and exercise Exercise 3.7 use a different database than that used for most Tutorial exercises. Before beginning these exercises, make sure that you are no longer serving the ProxWebKB database. Data files must be on the same machine as that serving the database.

  1. Uncompress the plain text data files.

    > cd $PROX_HOME/doc/user/tutorial/examples
    > gunzip movie_db.tar.gz
    

    Uncompressing this tar file creates a MovieDB directory under the $PROX_HOME/doc/user/tutorial/examples directory. All plain text data files required for the current import operation must be located in the same directory.

  2. Start the MonetDB server. Data files must be on the same machine as that serving the database.

    > Mserver --dbname MovieDB $PROX_HOME/resources/init-mserver.mil
    

    The init-mserver.mil script sets the port for the server to 30000. Remember to use a port number > 40000 if you are using MonetDB 4.6.2. See “Running the MonetDB database server” for more information.

    Because the database does not exist, MonetDB prints warning statements along with its usual startup message:

    !WARNING: GDKlockHome: created directory
        /usr/local/Monet-mars/var/MonetDB/dbfarm/MovieDB/
    !WARNING: GDKlockHome: ignoring empty or invalid .gdk_lock.
    !WARNING: BBPdir: initializing BBP.
    # MonetDB Server v4.20.0
    # based on GDK   v1.20.0
    # Copyright (c) 1993-2007, CWI. All rights reserved.
    # Compiled for powerpc-apple-darwin8.10.0/32bit with 32bit OIDs; dynamically linked.
    # Visit http://monetdb.cwi.nl/ for further information.
    Listening on port 30000
    MonetDB>
    

    The startup message may be slightly different depending on your operating system and the version of MonetDB you are using.

    MonetDB also creates a MovieDB directory in its dbfarm directory to hold the new database.

    Leave the MonetDB server running for the remainder of the import process. You must be serving the database for any Proximity action that interacts with database data.

  3. Initialize the new Proximity database. (Substitute the appropriate port number if you are using a different port.)

    > cd $PROX_HOME
    > bin/db-util.sh localhost:30000 init-db
    

    Proximity outputs the following trace (leading information showing elapsed time and execution thread has been omitted from the trace for brevity):

    INFO  app.DBUtil: * connecting to db
    INFO  app.DBUtil: * database opened; initializing Prox tables
    INFO  db.DB: * initializing Proximity database
    INFO  app.DBUtil: * tables initialized
    INFO  app.DBUtil: * disconnecting from db
    INFO  app.DBUtil: * done
    

  4. Import the plain text data file into the new Proximity database. (Substitute the appropriate port number if you are using a different port.)

    > bin/import-text.sh localhost:30000 \
      $PROX_HOME/doc/user/tutorial/examples/MovieDB
    

    The plain text data files must be on the same machine as that serving the (still empty) database. You must provide an absolute path to the data files; relative paths cannot be used.

    During import, Proximity reports on the entities being defined (leading information showing elapsed time and execution thread has been omitted from the trace for brevity):

    INFO  app.ImportTextApp: * importing database from 
       /proximity/doc/user/tutorial/examples/MovieDB
    INFO  app.ImportTextApp:   Loading object table
    INFO  app.ImportTextApp:   Loading link table
    INFO  app.ImportTextApp:   Loading attributes
    INFO  app.ImportTextApp:   Loading attribute: O_attr_objtype.data
    INFO  app.ImportTextApp:   Loading attribute: O_attr_title.data
    INFO  app.ImportTextApp:   Loading attribute: O_attr_name.data
    INFO  app.ImportTextApp:   Loading attribute: L_attr_linktype.data
    INFO  app.ImportTextApp:   Loading attribute: C_attr_qgraph_query.data
    INFO  app.ImportTextApp:   Loading containers
    INFO  app.ImportTextApp:   Loading container: si_0
    INFO  app.ImportTextApp:   Loading container attribute: si_0_attr_samplenumber.data
    INFO  app.ImportTextApp:   Loading container: si_1
    INFO  app.ImportTextApp:   Loading container: si_2
    INFO  app.ImportTextApp:   Loading container: si_3
    INFO  app.ImportTextApp: * done importing