A numeric annotation specifies how many database entities must match a particular query element. Both vertices and edges may be annotated. (Subqueries are also annotated, as described in Chapter 6, Subqueries.) Numeric annotations can specify a range of values, giving them a great deal more flexibility than the alternative of specifying exact structural matches.
There are three legal forms of numeric annotation:
An unbounded range
[i..] on a vertex or edge means that at
least i instances of the corresponding
database element must be present to match the query. An unbounded
range will match any number greater than or equal to
i of database elements.
A bounded range
[i..j] means
that at least i and no more than
j instances are required for a
match. The query will not match database structures that have
fewer than or more than the specified number of elements.
An exact annotation
means that exactly
the specified number of database elements must be present to match
the query. For example, if you specify a vertex annotation of
[2], the query will not match database
structures that have one, three, or more matching vertices.
We can use numeric annotations to restate the query with which we began this chapter, finding movies produced by exactly two studios:
Figure 4.3 includes two numeric annotations,
one on the vertex representing studio objects, and one on
the adjacent edge. The [2] annotation on the
studio vertex
indicates that the query can only
match subgraphs containing exactly two studio objects.
Of course, all the other parts of the query must also be
satisfied—those two studio objects must be linked to the same
actor object by produced links. This annotation also serves to group
the two studio-movie pairs in a single subgraph with one movie object
and two linked studio objects, rather than returning the
multiple subgraphs we saw in
Chapter 2, Query Basics.
The [1..] edge annotation is included because
we cannot assume that linked objects
in a database are connected by only a single link. If a database
contains multiple links between objects, then we usually want to group
these links, in addition to grouping the objects, in the query
results. Because we may not know how many links connect one object to
another, we use the unbounded annotation [1..] on
the edge. For now, we’ll note that this is usually the correct
annotation for an adjacent
edge and simply follow this convention in
defining the next several queries. The section on
“Understanding Multiple Annotations” later in this chapter provides a
more complete explanation of this edge annotation.
Edges adjacent to annotated vertices must be annotated for the reason cited above. Only one of two adjacent vertices may be annotated because annotating adjacent vertices can result in ambiguities in interpreting the query. Proximity enforces these requirements and will not execute queries with illegal annotations. See “Adjacency Requirements” later in this chapter for a more detailed explanation of the reasons behind these requirements.
When executing an annotated query, the vertex annotation takes precedence over the edge annotation. That is, the query processor first satisfies requirements on the vertex and then checks to see if it can satisfy requirements on the corresponding edges.
To see how Proximity handles the query shown in Figure 4.3, consider the database fragment shown in Figure 4.4. This fragment contains information about studios that produced some recent Academy Award winning pictures.
The above fragment includes four different movies: two produced by a
single studio (Forrest Gump and
Chicago), one produced by two studios
(Titanic), and one produced by three studios
(Shakespeare in Love).
Executing the query shown in Figure 4.3
on this database fragment yields the matching subgraph shown in
Figure 4.5.
Rather than returning two subgraphs, each with one movie and one
studio, this query returns a single subgraph containing the same data
that would have been spread across multiple matches had we omitted
the annotations from the query.
Because we used an exact annotation of [2] on
the studio vertex, the query does not match
subgraphs containing movies connected to a single studio or to more
than two studios. If we want to instead find all the movies produced
by two or more studios, we need to change the numeric annotation on
the studio vertex to use the unbounded
range [2..], as shown in
Figure 4.6.
The results of executing this modified query on the data shown in
Figure 4.4 are shown below:
This time, the unbounded annotation [2..] on the
studio vertex matches both the
subgraph containing the two studios that produced
Titanic and the subgraph containing the
three studios that produced Shakespeare in
Love.
A variation on this query structure forms one of the most common QGraph queries, the star query. Star queries find all database elements linked to a core object. Star queries are typically used to find subgraphs such as “all actors in a movie” or “all authors for a paper” (assuming the corresponding database contains the appropriate objects and links).
Star queries can use either
directed or
undirected edges.
To create a star query for the movie and studio database, we need to determine which type of objects should serve as the core vertex for the query. Because this database links multiple studios to a single movie, we make the core vertex match movie objects.
The query above finds and returns a subgraph for each movie in the
database. Each subgraph includes all the actors linked from that
movie. The results of executing
this query on the database fragment in
Figure 4.4 are shown below:
Just as we can annotate vertices so that they match more than one object, we can also annotate edges so that they match more than one link. For example, the database fragment shown in Figure 4.11 contains information on several actors and the roles they played in the movie Angels in America.
The database fragment indicates that Al Pacino played a single role,
Justin Kirk played two different roles, and Meryl Streep played four
different roles in this movie.
It’s worth noting that the database fragment shown in Figure 4.11 uses a different schema to represent actors and roles from that used in Figure 3.8. The example in Chapter 3, Conditions used multiple attribute values on a single link to indicate that an actor played multiple roles in a movie. The example in this chapter uses multiple links to represent the same kind of information. Proximity does not requires any particular representational schema for a given dataset, although consistency within a dataset is important. You can determine the appropriate schema for your data.
A query that uses edge annotations to find actors playing multiple roles is shown in Figure 4.12.
Here we include the annotation [2..] on the edge
connecting the actor vertex to the
movie vertex, indicating that the query
matches actor-movie pairs connected by two or more
role links. Annotated edges can stand alone;
they do not require that any adjacent
vertices be annotated.
The existence condition on the
role edge
requires that matching edges have a Role
attribute, but doesn’t place any requirements on the specific
value of this attribute.
The results of executing this query on the database fragment shown in Figure 4.11 are shown below.
Just as vertex annotations group matching objects, the query’s edge annotation groups matching links into a single subgraph in the query’s results. Without the edge annotation, this query returns seven subgraphs—one for each unique actor-role-movie subgraph in the database.
An annotation of [1] is not
equivalent
to no
annotation. A [1]
annotation requires that the query only match subgraphs that contain
exactly one of the annotated entities. A query with no annotation
will match each appropriate database entity regardless of number,
although it will not group the matches into a single subgraph. This
can be seen by comparing the results for the two queries below.
The query on the right includes an exact [1]
annotation on the studio vertex (and
the standard [1..] annotation on the
incident edge to satisfy QGraph’s adjacency requirements for
annotations). The query on the left has no annotations.
Executing these queries on the database fragment shown in
Figure 4.4 yields distinctly different
results. Figure 4.15 shows the
results from the unannotated query.
The query without annotations matches all the studio-movie pairs in the
database. Studios are not grouped; each match forms a separate
subgraph. Compare these results to that for the query containing the
[1] vertex annotation.
The results of executing the annotated query include just
two subgraphs, matching the two
instances in the database where a movie is linked to exactly one
studio.