Glossary

aggregation

Aggregation is the process of grouping data. For example, in Proximity the models aggregate attribute values to create features. Example aggregation functions include average, count, degree, and proportion.

ambiguous

A query is ambiguous if it admits more than one interpretation. For example, a query with adjacent annotated vertices is ambiguous because neither annotation takes precedence over the other. Ambiguous queries are not permitted in QGraph.

attribute

An attribute is a name-value pair that represents additional information about the database entity on which it appears. Proximity attribute values are sets, which may contain zero or more values and which may include specific values more than once. Proximity supports attributes for objects, links, subgraphs, and containers. Objects and links can have a variable number of attributes and attributes can have a variable number of values.

attribute constraint

Attribute constraints compare the attribute values of two database entities, such as two objects or two links.

attribute value condition

Attribute value conditions restrict query matches to objects or links having the attribute value specified in the condition.

bias

Bias is the systematic difference between the values predicted by the learned model and actual values.

boundary edge

A boundary edge is a query edge that crosses a subquery box.

boundary vertex

A boundary vertex is a vertex within a subquery box that is connected to query elements outside the subquery box by a boundary edge.

bounded range

A numeric annotation of the form [i..j], a bounded range specifies that there must be at least i and no more than j corresponding elements to match the query.

comparable types

Conditions and constraints can compare attribute values of the same type (e.g., STR with STR). In addition, you can compare attributes of type DBL with FLT and INT, and attributes of type FLT with INT.

complex condition

In principle, a complex condition can be any boolean combination of simple conditions. Proximity’s implementation of QGraph restricts complex conditions to disjunctive normal form.

condition

A condition restricts query matches by requiring that database items match specified attribute values. For example, the condition ObjType = person on a vertex requires that that vertex only match objects with a ObjType attribute having a value of person. Existence conditions work similarly, but only require that the corresponding object or link have any value for the specified attribute without caring what that value is.

constraint

Constraints compare the attribute values or identities of two distinct query elements. Only pairs of objects or links that satisfy the constraint match the corresponding query.

container

A container is a collection of subgraphs usually created as the result of executing a query.

core vertex

The core vertex is the vertex from a QGraph query that corresponds to the object to be classified. Objects and links connected to the core vertex define the local neighborhood to be used in classifying the core object.

directed edge

A directed edge in a query requires matching links in the database follow the same direction as the query edge.

disconnected query

A query must be a single connected graph. A query containing more than one connected component is considered to be disconnected and thus not allowed.

disjunctive normal form

Boolean formulae expressed as a disjunction of conjunctions are said to be in disjunctive normal form. Such formulae consist of a series of disjunctions (expressions ORed together) where each expression is either a terminal expression (a simple proposition in the case of boolean logic or a simple condition in the case of Proximity conditions), the negation of a terminal expression, or the conjunction (expressions ANDed together) of terminal expressions.

edge

Proximity uses the terms vertex and edge to refer to entities in a query and the terms object an link to refer to entities in the data. An edge in a query matches corresponding links in the data.

exact annotation

An exact annotation requires a specific number of matches for example, [2], rather than a range in the number of matches such as [2-4] or [2..].

existence condition

Existence conditions check to see if the corresponding object or link has a value for the specified attribute. An existence condition is satisfied if the corresponding item has any value for the attribute, regardless of what that value is.

identity constraint

Identity constraints compare the identity (OID) of two database entities.

inner structure

The inner structure of a subquery is the set of vertices and edges that fall entirely within the subquery box. The boundary edge and subquery annotation are not part of the inner structure.

isomorphic

Isomorphic subgraphs have the same structure in terms of nodes and edges but may have different member objects and links.

knowledge discovery

Knowledge discovery seeks to find useful patterns in large and complex databases. More specifically, relational knowledge discovery focuses on constructing useful statistical models from data about complex relationships among people, places, things, and events.

link

A link is a directed binary relation connecting two objects in a Proximity database. We use link to refer to relations in a database and edge to refer to relations in a query.

loop

see self link

mirror match

A mirror match to a QGraph query is a pair of otherwise identical subgraphs that differ only in terms of how the subgraph’s objects and links match the query’s vertices and edges. Specifically, in a mirror match, two objects reverse positions from their original locations in the other matching subgraph.

multi-dimensional attribute

A multi-dimensional (or multi-column) attribute in Proximity contains more than one value. For example, a location attribute might contain two values corresponding to the x and y coordinates of the item’s position. Proximity permits the inclusion of multi-dimensional attributes in data but does not yet support their use in queries or models.

name

Each vertex and edge in a Proximity query is assigned a name (label) that is used to identify the corresponding items in the matching subgraphs.

negated element

A negated query element (vertex or edge) has a numeric annotation of [0]; there must be no corresponding element in the data when matching the query.

numeric annotation

Numeric annotations place limits on the number of isomorphic substructures that can occur in matching portions of the database. Annotations also serve to group isomorphic structures into a single subgraph rather than producing multiple matches.

object

An object is a Proximity database entity that represents things in the world such as people, places, and events.

optional element

Optional query elements define structures that can be, but are not required to be present in the data in order to match the query. They are annotated with [0..n] or [0..].

precedence

Precedence determines the order in which query elements are considered in matching the database. An annotated vertex has precedence over an annotated edge because the match process first finds objects that match the annotated vertex and then finds links from that object to match the corresponding edges in the query.

prefix notation

Prefix notation places an expression’s operator before its operands. For example, the expression a + b becomes + a b when using prefix notation.

propositionalizing data

Propositionalizing data “flattens” relational data by moving attributes of related items to the objects of interest. For example, in a system that reasons about movies, propositionalizing an attribute of a director, such as date-of-birth, might place that attribute on related movie objects as director-date-of-birth.

QGraph

QGraph is a visual query language designed to support knowledge discovery in large graph databases.

relational data

As used in this document, relational data refers to data that explicitly represent relations among objects as first-class entities. Relational data are represented by a directed graph in which nodes represent objects from the domain of interest and links represent relationships between pairs of objects.

schema

A database’s schema determines how the data are represented, i.e., which data entities are mapped to objects, which are mapped to links, and what constitutes attributes of those objects and links. Proximity also uses an internal schema that determines how Proximity database structures map to MonetDB data structures.

self link

A self-link connects a node with itself. For example, web pages often contain hyperlinks that jump to another part of the current page, linking the web page to itself.

star query

A one-dimensional star query includes a core vertex and one or more neighboring vertices, each connected to the core vertex by a single edge. Typically, the neighboring vertices (and therefore the corresponding edges as well) are annotated with an unbounded range, permitting any number of matching neighbor objects and links. Star queries can be extended to additional dimensions through the use of subqueries.

subgraph

A subgraph is a connected portion of a graph. QGraph queries return subgraphs as matches to the query.

subquery

A subquery is a connected subgraph of vertices and edges that can be treated as a logical unit. Subqueries allow grouping and limiting of complex query structures rather than just individual query elements.

type

A type is a label that categorizes instances in a data set, usually represented as an attribute-value pair assigned to an object or link. For example, a data set might contain objects that represent three types of entities: actors, movies, and studios. Proximity does not require a type attribute, but users may specify zero, one, or many attributes that provide type information. These attributes can be practical for the user, but in fact Proximity does not distinguish attributes representing type information from attributes representing other kinds of information.

unbounded range

A numeric annotation of the form [i..], an unbounded range specifies that there must be at least i corresponding element(s) to match the query.

undirected edge

An undirected edge in a query matches links in the database regardless of the link’s direction.

validation

Validation is the process of ensuring that an XML document obeys the structure specified in the associated DTD. In Proximity, queries (which are represented internally in XML) must validate against the DTD in graph-query.dtd. Because DTDs cannot specify semantic content or enforce all potential syntactic requirements, a syntactically valid query may still be illegal under the rules of QGraph.

vertex

Proximity uses the terms vertex and edge to refer to entities in a query and the terms object an link to refer to entities in the data. A vertex in a query matches corresponding objects in the data.

well formed

A well-formed query conforms to all rules governing how queries may be legally structured.