Chapter 6. Subqueries

Table of Contents

Subqueries and Annotations
Subqueries and Constraints
Constraints within a subquery
Constraints crossing the subquery boundary
Multiple Subqueries
Nested subqueries
Implementation in Proximity
Edge requirements
Annotation requirements
Nested subqueries
Constraint restrictions
Summary

A subquery is a connected subgraph of vertices and edges that can be treated as a logical unit. Using subqueries expands the expressive power of QGraph, enabling you to identify more complex structures than could be found otherwise.

The example with which we opened this Guide, repeated in Figure 6.1, shows a query that finds subgraphs containing a director, all the movies he or she has directed, and all the actors who have appeared in those movies. This example contains a subquery, denoted by the box surrounding the movie and actor vertices and the acted-in edge.

Example query with subquery [Intro_DB01_Q01.qg2.xml]

Figure 6.1. Example query with subquery [Intro_DB01_Q01.qg2.xml]


The subquery in Figure 6.1 is linked to its parent query by an edge connecting the movie vertex to the director vertex. We call this edge (the edge labeled directed in the above query) a boundary edge of the subquery. All subqueries must be connected to one or more vertices in the main query. (However, see “Implementation in Proximity” later in this chapter for restrictions on how subqueries can be connected to the main query in Proximity.)

Subqueries expand QGraph’s expressive power by letting you attach a numeric annotation to a connected set of vertices and edges instead of just a single vertex or edge. This effectively lets you treat a more complex structure as if it were a single vertex. For example, if we replace the subquery in Figure 6.1 with a single vertex, we see the familiar star query structure.

Conceptual structure of query in

Figure 6.2. Conceptual structure of query in Figure 6.1


This diagram shows the subquery as a single vertex. The [1..] annotation on the subquery means that the complex structures matching the subquery are grouped in the same way that objects are grouped when matching an annotated vertex. Thus all the movies and their linked actors for a specific director will be included in a single subgraph. Executing this query on the sample database contained in Intro_DB01.xml returns six subgraphs, one for each director object in the database. The subgraph where the director vertex matches Steven Spielberg is shown in Figure 6.3 (Edge labels have been removed for space reasons.)

Query results for director = Steven Spielberg (edge labels omitted)

Figure 6.3. Query results for director = Steven Spielberg (edge labels omitted)


All the movies directed by Steven Spielberg, as well as all the actors linked to those movies, are included in this single subgraph.

Compare the subgraph shown in Figure 6.3 to the results of a similar query that does not use subqueries. Figure 6.4 shows the query from Figure 6.1, but without the subquery box and subquery annotation.

Similar query without subquery [SubQ_DB01_Q02.qg2.xml]

Figure 6.4. Similar query without subquery [SubQ_DB01_Q02.qg2.xml]


Executing this query on the same data returns 23 subgraphs, one for each movie in the database. The subgraphs in which the director vertex matches Steven Spielberg are shown in Figure 6.5.

Query results for director = Steven Spielberg (edge labels omitted)

Figure 6.5. Query results for director = Steven Spielberg (edge labels omitted)


Because the movie vertex is not annotated, the query cannot group all the movies for a single director and must return a separate subgraph for each director-movie pair in the data.

The inner structure of a subquery must be a well-formed query in its own right. (The inner structure of a subquery is the part that remains after removing the boundary edges and the subquery box with its annotation.) For example, the inner structure of the subquery shown in Figure 6.1, shown below, forms a valid query in its own right.

Inner structure of subquery [SubQ_DB01_Q01_SubQ.qg2.xml]

Figure 6.6. Inner structure of subquery [SubQ_DB01_Q01_SubQ.qg2.xml]


The inner structure of this subquery is the familiar star query that finds all actors linked to a single movie. If we choose, we can create and execute a new query containing just this structure.

In particular, because disconnected queries are not well-formed, subqueries cannot be disconnected. That is, all the vertices and edges inside the subquery box must be connected in a single graph. For example, the query shown in Figure 6.7 contains a disconnected subquery.

Illegal query containing disconnected subquery

Figure 6.7. Illegal query containing disconnected subquery


If we look only at the elements inside the subquery box, we see that the D vertex is not connected to the other subquery components (the B and C vertices and the Z edge). Although the the query as a whole is connected, because it contains a disconnected subquery, the query shown in Figure 6.7 is illegal.

Subqueries and Annotations

QGraph requires that all subqueries must be annotated. An unannotated subquery is equivalent to the same query structure without the subquery box. If we could remove the annotation from the subquery box and then run the query shown in Figure 6.1, we would see the same results we saw in Figure 6.5. Because an unannotated subquery duplicates capabilities available via other QGraph elements, unannotated subqueries add nothing to QGraph’s expressive power and QGraph therefore requires that all subqueries must be annotated.

Because subqueries are annotated, they must obey all the QGraph rules that apply to annotated query elements. For example, QGraph requires that the edge adjacent to an annotated element must itself be annotated. Therefore the boundary edge(s) of an annotated subquery must always be annotated.

The query shown in Figure 6.1 illustrates the proper annotation of a subquery’s boundary edge. As we saw earlier in Chapter 4, Numeric Annotations, although other annotations are also legal, most queries will probably use the [1..] annotation on the boundary edge.

Because QGraph prohibits edges connecting two annotated elements, no numeric annotation is allowed on a vertex adjacent to an annotated subquery. For example, the query structure shown below is illegal:

Illegal annotation (vertex A)

Figure 6.8. Illegal annotation (vertex A)


As we saw in the first part this chapter, we can mentally substitute a single vertex for a subquery to better understand the conceptual structure of a query. This heuristic also works well for seeing potential annotation problems involving subqueries. For example, we can visualize the query shown in Figure 6.8 as the structure shown below:

Conceptual structure of illegally annotated query

Figure 6.9. Conceptual structure of illegally annotated query


This conceptual view helps us see that the query includes an edge connecting two annotated elements, the subquery and the A vertex. Because this conceptual structure includes an illegal annotation, we can more easily see that the corresponding query is also illegal.