Appendix D. Proximity Text Data Format

This appendix describes the plain text format for importing data into and exporting data out of Proximity. This format is commonly used to export data from Proximity and then re-import it into a new database requiring a different format, such as when upgrading to a non-compatible version of MonetDB.

[Caution]

The utilities that use the Proximity plain text data format perform no error checking. Although this format requires less disc space than the XML format and its use can improve import and export speed, users are solely responsible for maintaining data consistency when using this format.

All files referenced in the set of data files must be present and in the same directory. For example, the container.data file references data files (e.g., si_0_attrs.data) that define the attributes for each container’s subgraphs. If the specified container has no subgraph attributes, these files may be empty but they must still be present. The required files are created automatically during export but may need to be constructed by hand for some import applications.

In addition to handling complete databases, Proximity can import and export individual attributes and containers to and from existing databases. Users are solely responsible for ensuring that imported attribute and container data correctly matches the identifiers and data types in the existing database.

See Chapter 3, Importing and Exporting Proximity Data for details on using the import-text.sh and export-text.sh scripts (import-text.bat and export-test.bat for Windows) to import data using this format.

The examples in this appendix are designed to illustrate the relevant data format and are not intended to represent, in whole or in part, a valid or semantically meaningful database.

Overview

The Proximity text data format stores data in multiple files. One set of files stores structure information:

objects.data stores the IDs for objects in the database
links.data stores the IDs and starting and ending point IDs for links in the database

Another set of files stores attribute data:

attributes.data stores the name, item type (object, link, or container) and data type for attributes in the database and includes pointers to the files containing the corresponding attribute values; subgraph attribute data are stored in a separate set of files
O_attr_attrname.data stores the object IDs and attribute values for the attrname object attribute
L_attr_attrname.data stores the link IDs and attribute values for the attrname link attribute
C_attr_attrname.data stores the container IDs and attribute values for the attrname container attribute

The names for the attribute data files are suggested conventions, which Proximity uses for naming exported data files. Proximity uses whatever filenames you provide in attributes.data when importing data.

Finally, container data (subgraph members and attributes) are stored in a separate series of files:

containers.data stores the ID, name, and location in the container hierarchy of the containers in the database and includes pointers to the files storing subgraph data for these containers
si_n.objects.data stores the object ID, subgraph ID, and label assigned by the originating query of the objects in container n
si_n.links.data stores the link ID, subgraph ID, and label assigned by the originating query of the links in container n
si_n_attrs.data stores the attribute name and data type of subgraph attributes in container n and points to the files containing the corresponding attribute values
si_n_attrname.data stores the subgraph IDs and values for subgraph attribute attrname in container n