Chapter 4. Numeric Annotations

Table of Contents

The Need for Counting
Annotation Basics
Understanding Multiple Annotations
Negated and Optional Elements
Annotating edges adjacent to negative and optional vertices
Negated elements versus inequality conditions
Adjacency Requirements
Implementation in Proximity
Implementation restrictions
Efficiency considerations
Summary

The Need for Counting

The queries we’ve examined so far work fine when we know the exact structure of the subgraphs we want to find in the database. For example, if we want to find movies produced by two different studios, we create a query that includes two studio vertices, one for each studio credited with producing the movie, as shown in Figure 4.1

Movies produced by two studios [Annot_DB01_Q01.qg2.xml]

Figure 4.1. Movies produced by two studios [Annot_DB01_Q01.qg2.xml]


But this query has some problems. As we saw in Chapter 2, Query Basics, in addition to returning the desired subgraphs, this query’s results will include subgraphs with duplicated elements, that is, with the same studio matching both the studio1 and studio2 vertices. And what if we want to instead find movies produced by two or more studios? We have to create separate queries for movies produced by three studios, by four studios, and so on. How high do we go? In many cases, we won’t know the upper bound ahead of time. How can we create a query that finds all movies and their associated studios, without including duplicated elements, regardless of the number of studios involved?

Recall, as well, that the queries described so far return separate subgraphs for each match. Consider the author-book query shown in Figure 4.2.

Simple author-book query

Figure 4.2. Simple author-book query


If our database contains 40 different books written by Stephen King, the query will return 40 different subgraphs, one for each author-book pair, even though all contain the same author. How can we create a query that collapses all the resulting subgraphs into a structure that more closely resembles the underlying structure of the data?

These cases are handled by numeric annotations. Numeric annotations place limits on the number of isomorphic structures that can occur in matching portions of the database. Limits can involve lower bounds, upper bounds, or both. Numeric annotations also serve to group isomorphic structures into a single subgraph that would otherwise produce multiple matches in the query results. QGraph does not provide any mechanism for limiting the number of matching substructures without grouping the results.