Aggregation is the process of grouping data. For example, in Proximity the models aggregate attribute values to create features. Example aggregation functions include average, count, degree, and proportion.
A query is ambiguous if it admits more than one interpretation. For example, a query with adjacent annotated vertices is ambiguous because neither annotation takes precedence over the other. Ambiguous queries are not permitted in QGraph.
An attribute is a name-value pair that represents additional information about the database entity on which it appears. Proximity attribute values are sets, which may contain zero or more values and which may include specific values more than once. Proximity supports attributes for objects, links, subgraphs, and containers. Objects and links can have a variable number of attributes and attributes can have a variable number of values.
Attribute constraints compare the attribute values of two database entities, such as two objects or two links.
Attribute value conditions restrict query matches to objects or links having the attribute value specified in the condition.
Bias is the systematic difference between the values predicted by the learned model and actual values.
A boundary edge is a query edge that crosses a subquery box.
A boundary vertex is a vertex within a subquery box that is connected to query elements outside the subquery box by a boundary edge.
A numeric annotation of the form
[i..j], a
bounded range specifies that there must be at least
i and no more than
j corresponding elements to match the
query.
Conditions and constraints can compare attribute values of
the same type (e.g., STR with
STR). In addition, you can compare attributes
of type DBL with FLT and
INT, and attributes of type
FLT with INT.
In principle, a complex condition can be any boolean combination of simple conditions. Proximity’s implementation of QGraph restricts complex conditions to disjunctive normal form.
A condition restricts query matches by requiring that database items match specified attribute values. For example, the condition ObjType = person on a vertex requires that that vertex only match objects with a ObjType attribute having a value of person. Existence conditions work similarly, but only require that the corresponding object or link have any value for the specified attribute without caring what that value is.
Constraints compare the attribute values or identities of two distinct query elements. Only pairs of objects or links that satisfy the constraint match the corresponding query.
A container is a collection of subgraphs usually created as the result of executing a query.
The core vertex is the vertex from a QGraph query that corresponds to the object to be classified. Objects and links connected to the core vertex define the local neighborhood to be used in classifying the core object.
A directed edge in a query requires matching links in the database follow the same direction as the query edge.
A query must be a single connected graph. A query containing more than one connected component is considered to be disconnected and thus not allowed.
Boolean formulae expressed as a disjunction of
conjunctions are said to be in disjunctive normal form. Such
formulae consist of a series of disjunctions (expressions
ORed together) where each expression is either
a terminal expression (a simple proposition in the case of boolean
logic or a simple condition in the case of Proximity conditions),
the negation of a terminal expression, or the conjunction
(expressions ANDed together) of terminal
expressions.
Proximity uses the terms vertex and edge to refer to entities in a query and the terms object an link to refer to entities in the data. An edge in a query matches corresponding links in the data.
An exact annotation requires a specific number of matches for example, [2], rather than a range in the number of matches such as [2-4] or [2..].
Existence conditions check to see if the corresponding object or link has a value for the specified attribute. An existence condition is satisfied if the corresponding item has any value for the attribute, regardless of what that value is.
Identity constraints compare the identity (OID) of two database entities.
The inner structure of a subquery is the set of vertices and edges that fall entirely within the subquery box. The boundary edge and subquery annotation are not part of the inner structure.
Isomorphic subgraphs have the same structure in terms of nodes and edges but may have different member objects and links.
Knowledge discovery seeks to find useful patterns in large and complex databases. More specifically, relational knowledge discovery focuses on constructing useful statistical models from data about complex relationships among people, places, things, and events.
A link is a directed binary relation connecting two objects in a Proximity database. We use link to refer to relations in a database and edge to refer to relations in a query.
see self link
A mirror match to a QGraph query is a pair of otherwise identical subgraphs that differ only in terms of how the subgraph’s objects and links match the query’s vertices and edges. Specifically, in a mirror match, two objects reverse positions from their original locations in the other matching subgraph.
A multi-dimensional (or multi-column) attribute in Proximity
contains more than one value. For example, a
location attribute might contain two
values corresponding to the x and
y coordinates of the item’s position.
Proximity permits the inclusion of multi-dimensional
attributes in data but does not yet support their use in queries
or models.
Each vertex and edge in a Proximity query is assigned a name (label) that is used to identify the corresponding items in the matching subgraphs.
A negated query element (vertex or edge) has a numeric
annotation of [0]; there must be no
corresponding element in the data when matching the query.
Numeric annotations place limits on the number of isomorphic substructures that can occur in matching portions of the database. Annotations also serve to group isomorphic structures into a single subgraph rather than producing multiple matches.
An object is a Proximity database entity that represents things in the world such as people, places, and events.
Optional query elements define structures that can be, but
are not required to be present in the data in order to match the
query. They are annotated with
[0..
or n][0..].
Precedence determines the order in which query elements are considered in matching the database. An annotated vertex has precedence over an annotated edge because the match process first finds objects that match the annotated vertex and then finds links from that object to match the corresponding edges in the query.
Prefix notation places an expression’s operator before
its operands. For example, the expression
a + b becomes
+ a b when using
prefix notation.
Propositionalizing data “flattens” relational data by moving attributes of related items to the objects of interest. For example, in a system that reasons about movies, propositionalizing an attribute of a director, such as date-of-birth, might place that attribute on related movie objects as director-date-of-birth.
QGraph is a visual query language designed to support knowledge discovery in large graph databases.
As used in this document, relational data refers to data that explicitly represent relations among objects as first-class entities. Relational data are represented by a directed graph in which nodes represent objects from the domain of interest and links represent relationships between pairs of objects.
A database’s schema determines how the data are represented, i.e., which data entities are mapped to objects, which are mapped to links, and what constitutes attributes of those objects and links. Proximity also uses an internal schema that determines how Proximity database structures map to MonetDB data structures.
A self-link connects a node with itself. For example, web pages often contain hyperlinks that jump to another part of the current page, linking the web page to itself.
A one-dimensional star query includes a core vertex and one or more neighboring vertices, each connected to the core vertex by a single edge. Typically, the neighboring vertices (and therefore the corresponding edges as well) are annotated with an unbounded range, permitting any number of matching neighbor objects and links. Star queries can be extended to additional dimensions through the use of subqueries.
A subgraph is a connected portion of a graph. QGraph queries return subgraphs as matches to the query.
A subquery is a connected subgraph of vertices and edges that can be treated as a logical unit. Subqueries allow grouping and limiting of complex query structures rather than just individual query elements.
A type is a label that categorizes instances in a data set, usually represented as an attribute-value pair assigned to an object or link. For example, a data set might contain objects that represent three types of entities: actors, movies, and studios. Proximity does not require a type attribute, but users may specify zero, one, or many attributes that provide type information. These attributes can be practical for the user, but in fact Proximity does not distinguish attributes representing type information from attributes representing other kinds of information.
A numeric annotation of the form
[i..], an unbounded range
specifies that there must be at least i
corresponding element(s) to match the query.
An undirected edge in a query matches links in the database regardless of the link’s direction.
Validation is the process of ensuring that an XML document
obeys the structure specified in the associated DTD. In Proximity,
queries (which are represented internally in XML) must validate
against the DTD in graph-query.dtd. Because
DTDs cannot specify semantic content or enforce all
potential syntactic requirements, a syntactically valid query may
still be illegal under the rules of QGraph.
Proximity uses the terms vertex and edge to refer to entities in a query and the terms object an link to refer to entities in the data. A vertex in a query matches corresponding objects in the data.
A well-formed query conforms to all rules governing how queries may be legally structured.