Importing database elements using XML

In addition to importing a complete database, Proximity lets you import individual containers and attributes. This feature lets Proximity users share data and results and store data off line. To import additional data into an existing database, use the import-xml.sh script (import-xml.bat for Windows) to import a Proximity XML data file containing the new data.

By default, you cannot import objects or links into an existing database if it already contains any items of that type. That is, you cannot import any additional objects if you have previously created at least one object in the database and you cannot import any additional links if you have previously created at least one link in the database. Similarly, you can only import new attributes. Once an attribute has been defined for a database, you cannot add additional values for that attribute. This behavior can be overridden through use of the noChecks flag, described in “Importing XML data using noChecks.

Imported containers are always created at the top level (directly under the root container), regardless of where the container lived in the source database. Any nested containers within the imported container retain their relative nesting, however. For example, if you exported the /1d-clusters/samples container, which includes nested containers /1d-clusters/samples/0 and /1d-clusters/samples/1, and later imported that container into another database, the destination database ends up with the containers /samples/0 and /samples/1 without the parent 1d-clusters container, regardless of whether the destination database already includes a 1d-clusters container. (The container hierarchy notation used in this paragraph is explained in “Exporting Data to XML”.)

[Caution]

When importing attributes and containers, you are responsible for ensuring that object, link, subgraph, and container identifiers match those in the existing database. Proximity makes no checks to ensure that attributes are assigned to items that are actually present in the database. Errors in identifiers may result in inaccurate data being stored in the database.

The following exercise walks through the process of importing a new attribute, url_hierarchy3, and its values into the existing ProxWebKB database. This attribute stores the third directory in the path after the domain name, extracted from the object’s URL. We can import this attribute into an existing database because the ProxWebKB database created in Exercise 3.1 does not include any values for the url_hierarchy3 attribute.

Exercise 3.2. Importing attribute values using XML:

Before beginning, make sure that you are serving the ProxWebKB database (created in Exercise 3.1) using Mserver. You must have completed Exercise 3.1 before running the current exercise. Data files must be on the same machine as that serving the database.

  1. Uncompress the file containing the url_hierarchy3 attribute values.

    > cd $PROX_HOME/doc/user/tutorial/examples
    > gunzip url_hierarchy3_attr.xml.gz
    
  2. Change to the $PROX_HOME directory.

  3. Import the url_hierarchy3 attribute data. (Substitute the appropriate port number if you are using a different port.)

    > bin/import-xml.sh localhost:30000 \
      $PROX_HOME/doc/user/tutorial/examples/url_hierarchy3_attr.xml
    

    When the import process is finished, Proximity reports on the number of database entities created.

    INFO  app.ImportXMLApp: * done importing; counts:
       0 objects, 0 links, 1 attributes, 868 attribute values,
       0 containers, 0 subgraph items
    

    Because many URLs do not include three levels of directories after the domain name, only 868 out of 4135 objects have a value for this attribute.