Debugging XPath

In this section we propose a debugging technique for XPath expressions. Our debugging process accepts as inputs a query Q preceded by the [DEBUG=r] command, where r a real number in the unit interval. For instance, in what follows we will focus on <<[DEBUG=0.5]/bib/book/title>>. Assuming an input XML document like the one depicted in Figures 1 and 2:

Fig. 1 - XML skeleton represented as a tree

 

Fig. 2 - Input XML document in our examples

The debugging produces a set of alternative queries Q1,...,Qn packed into an output XML document, like the one shown in Figure 3. The document has the following structure:

where the set of alternatives is ordered with respect to the cd key. This value measures the chance degree of the original query with respect to the new one, in the sense that as much changes are performed on Qi and as more traumatic they are with respect to Q, then the cd value becomes lower.

Fig. 3 - Figure 8: Debugging query <<[DEBUG=0.5]/bib/book/title>>

In Figure 3, the first alternative, with the highest cd, is just the original query, thus, the cd is 1, whose further execution with fuzzy XPath returns <<Don Quijote de La Mancha>>. As was commented in the introduction, we have assumed the debugger is ran even when the number of answers is not empty, like in this case. The remaining options give different cd's depending on the chance degree, and provide XPath expressions annotated with JUMP, DELETE and SWAP.

In order to explain the way in which our technique generates the attributes and content of each query tag in the output XML document, let us consider a generic path Q of the form: <<[DEBUG=r]/tag1/.../tagi/tagi+1/...>>, where we say that tagi is at level i in the original query. So, assume that during the exploration of the input query Q and the XML document D, we find that tagi in Q does not occurs at level i in (a branch of) D. Then, we consider the following three situations:

Swapping case: Instead of tagi, we find tag′i at level i in the input XML document D, being tagi and tag′i two similar terms with similarity degree s. Then, we generate an alternative query by adding the attribute tagi="tag′i" and replacing in the original path the occurrence "tagi/" by "[SWAP=s]tag′i/".

The second query proposed in Figure 3 illustrates this case:

<query cd="0.8" book="novel»/bib/[SWAP=0.8]novel/title</query>

Let us observe that : 1) we have included the attribute <<book="novel>> in order to suggest that instead of looking now for a book, finding a novel should be also a good alternative, 2) in the path we have replaced the tag book by novel and we have appropriately annotated the exact place where the change has been performed with the annotation [SWAP=0.8] and 3) the cd of the new query has been adjusted with the similarity degree 0.8 of the exchanged tags.

Now, we can run the (fuzzy) XPath queries <</bib/novel/title>> and even <</bib/[SWAP=0.8]novel/title>> (see Figure 4). In both cases we obtain the same result, i.e., <<La Celestina>>, but with different rsv (1 and 0.8).

Figure 4: Execution of query <</bib/[SWAP=0.8]novel/title>>

Jumping case: Even when tagi is not found at level i in the input XML document D, tagi+1 appears at a deeper level (i.e, greater than i) in a branch of D. Then, we generate an alternative query by adding the attribute tagi="//", which means that tagi has been jumped, and replacing in the path the occurrence "tag_i/" by "[JUMP=r]//", being r the value associated to DEBUG.

Figure 5: Execution of query <</bib/[JUMP=0.5]//title>>

Figure 6: Execution of query <</[JUMP=0.5]//book/title>>

This situation is illustrated by the third and fourth queries in Figure 3, where we propose to jump the tags book and bib. The execution of the queries returns different results, as shown in Figures 5 and 6, where JUMP produces similar effects to the DEEP command explained in the previous sections, that is, as more tags are jumped their resulting cd's become lower.

Deletion case: This situation emerges when at level i in the input XML document D, we found tagi+1 instead of tagi. So, the intuition tell us that tagi should be removed from the original query Q and hence, we generate an alternative query by adding the attribute tagi="" and replacing in the path the occurrence "tag_i/" by "[DELETE=r]", being r the value associated to DEBUG.

This situation is illustrated by the fifth query in Figure 3, where the deletion of the tag book is followed by a swapping of similar tags title and name. The cd 0.45 associated to this query is defined as the product of the values associated to both DELETE (0.5) and SWAP (0.9), and hence the chance degree of the original one is lower than the previous examples.

As seen in Figure 7, the execution of our new query is able to retrieve the information contained in the first branch of the input XML document listed in Figures 1 and 2. Here we illustrate that execution of debugged XPath expressions reveals hidden answers that can fulfill the programmer expectations.

Figure 7: Execution of query <</bib/[DELETE=0.5][SWAP=0.9]name>>

As we have seen in the previous example, the combined use of one or more debugging commands (SWAP, JUMP and DELETE) is not only allowed but also frequent. In other words, it is possible to find several debugging points.

In Figure 8, we can see the execution of the query: << <query cd="0.225" bib="" book="//" title="name» /[DELETE=0.5][JUMP=0.5]//[SWAP=0.9]name</query> >>, the cd 0.225 is quite low, and therefore the change degree is low, since it has been obtained by multiplying the three values associated to the deletion of the tag bib (0.5), jumping the tag book (0.5) and the swapping of title by name (0.9).

Figure 8: Execution of query <</[DELETE=0.5][JUMP=0.5]//[SWAP=0.9]name>>

The wide range of alternatives (Figure 3 is still incomplete), reveals the flexibility of our technique. The programmer is free to use the alternative queries to execute them, and to inspect results up to the expectations are covered.

Finally, we would like to remark that even when we have worked with a very simple query with three tags in our examples, our technique works with more complex queries with larger paths and connectives in conditions, as well as DEBUG used in several places on the query. For instance, in Figure 9 (compare it with Figure 3) we show the result of debugging the following query: <<[DEBUG=0.7]/bib/[ DEBUG=0.6]book/[DEBUG=0.5]title>>. Moreover, in Figure 10 we debug a query which needs to SWAP in its condition the wrong occurrence of <<cost>> by the similar word <<price>>: note that the first alternative produces (after deleting tag <<classic>>) exactly the same query (/bib[DEEP=0.5]//book[@year<2000 avg{3,1}@price<50]/title), but our debugger produces too more chances based on JUMP and SWAP commands, whose further execution are intended to produce new interesting results.

Figure 9: Debugging query <<[ DEBUG=0.7]/bib/[DEBUG=0.6]book/[ DEBUG=0.5]title>>

 

Figure 10: Debugging effects on XPath conditions associated to the complex query:
[DEBUG=0.6]/bib/classic/[ DEEP=0.8]//book[@year<2000 avg{3,1} @cost<50]/title

izmir escort- cratosslot baymavi vdcasino asyabahis tipobet