Table of Contents
The queries we’ve examined so far work fine when we know the exact structure of the subgraphs we want to find in the database. For example, if we want to find movies produced by two different studios, we create a query that includes two studio vertices, one for each studio credited with producing the movie, as shown in Figure 4.1
But this query has some problems. As we saw in
Chapter 2, Query Basics, in addition to
returning the desired subgraphs, this query’s results will include
subgraphs with duplicated
elements, that is, with the same studio
matching both the studio1 and
studio2
vertices. And what if we want to instead
find movies produced by two or more studios? We have to create
separate queries for movies produced by three studios, by four
studios, and so on. How high do we go? In many cases, we won’t know
the upper bound ahead of time. How can we create a query that finds
all movies and their associated studios, without including duplicated
elements, regardless of the number of studios involved?
Recall, as well, that the queries described so far return separate subgraphs for each match. Consider the author-book query shown in Figure 4.2.
If our database contains 40 different books written by Stephen
King, the query will return 40 different subgraphs, one for each
author-book pair, even though all contain the same author. How can we
create a query that collapses all the resulting subgraphs into a
structure that more closely resembles the underlying structure of the
data?
These cases are handled by numeric annotations. Numeric annotations place limits on the number of isomorphic structures that can occur in matching portions of the database. Limits can involve lower bounds, upper bounds, or both. Numeric annotations also serve to group isomorphic structures into a single subgraph that would otherwise produce multiple matches in the query results. QGraph does not provide any mechanism for limiting the number of matching substructures without grouping the results.