Subqueries and Constraints

In this section we examine the ways in which subquery elements can be used in constraints. A constraint may compare two elements within the same subquery, or it may compare elements that span the subquery boundary. Constraints that involve elements inside a subquery must obey the same rules that apply to any QGraph constraint. Proximity’s current implementation of QGraph imposes additional restrictions for constraints that cross the subquery boundary. See “Implementation in Proximity” later in this chapter for information on these restrictions.

Constraints within a subquery

Let’s look first at a constraint that compares two items within the same subquery. Our example uses a database containing information on student and faculty web pages and their interconnecting links. A fragment of such a database is shown in Figure 6.10.

Database fragment [SubQ_DB02.xml]

Figure 6.10. Database fragment [SubQ_DB02.xml]


It’s common for two web pages to link to each other and for pages to link to themselves. We see this reflected in our database fragment where, for example, page01.html and page03.html point to each other and where page06.html links to itself. Our goal is to find all the student pages that we can reach by following exactly two links (hops) from a faculty page. Figure 6.11 shows a first pass at creating such a query.

Query [SubQ_DB02_Q01.qg2.xml]

Figure 6.11. Query [SubQ_DB02_Q01.qg2.xml]


We want our results to include a single subgraph for each faculty member, so the query uses a subquery to group the cluster of pages linked from each linked-page vertex.

Because the database contains objects that link to themselves, this query will incorrectly identify some pages as being two hops away when the second hop follows this self link. We see this in the results of executing the query on our database fragment:

Query results

Figure 6.12. Query results


Because page06.html links to itself, the top subgraph shows page06.html as being both one and two hops away from page01.html. Similarly, because page05.html links to itself, the bottom subgraph shows that page05.html matches both the linked-page and student vertices. To eliminate these matches, we add an identity constraint that requires that the linked-page and student vertices not match the same object.

Revised query with constraint [SubQ_DB02_Q02.qg2.xml]

Figure 6.13. Revised query with constraint [SubQ_DB02_Q02.qg2.xml]


The results of executing this modified query on our database fragment are shown below:

Revised query results

Figure 6.14. Revised query results


The two student pages, page05.html and page06.html, that were included as a result of the self links are no longer included in the query results.