Introduction
User's preferences play a key role in information retrieval. In modern Web based infor- mation retrieval systems, user expects to introduce his(er) key words and preferences to influence the search results. However, while the technology is still improving, users get sometimes frustrated with the retrieval system which does not offer enough mechanisms and even being equipped is too rigid. Therefore the need for flexible query languages arises, in which the user can formulate queries according to his(er) preferences, being adaptable to data schema but without increasing complexity. In addition, flexible query languages should be equipped with a mechanism for obtaining a certain ranked list of answers. The ranking of answers can provide satisfaction degrees depending on several factors.
The XPath language [BBC+07] has been proposed as a standard for XML querying and it is based on the description of the path in the XML tree to be retrieved. XPath allows to specify the name of nodes (i.e., tags) and attributes to be present in the XML tree together with boolean conditions about the content of nodes and attributes.
XPath querying mechanism is based on a boolean logic: the nodes retrieved from an XPath expression are those matching the path of the XML tree. Therefore, the user should know the XML schema in order to specify queries. However, even when the XML schema exists, it may not be available for users. Moreover, XML documents with the same XML schema can be very different in structure. Let us suppose the case of XML documents containing the curriculum vitae of a certain group of persons. Although they can share the same schema, each one can decide to include studies, jobs, training, etc. organized in several ways: by year, by relevance, and with different nesting degree. In a XPath-based structural query, the main criteria to provide a cer- tain degree of satisfaction are the hierarchical deepness and document order. However, user's preferences play also a key role in determining the best solutions. Conditions on XPath expressions are commonly ranked, that is, the user gives a bigger degree of im- portance to certain requirements when satisfying his(er) wishes. Therefore, the query language should provide mechanisms for assigning priority to answers, when they occur in different parts of the document, as well as priority to queries, with regard to user's preferences.
We present a fuzzy variant of the XPath query language for the flexible information retrieval on XML documents. Our main purpose is to provide a repertoire of operators that offer the possibility of managing satisfaction degrees by adding structural constraints and fuzzy operators inside conditions (which must be considered from now on as fuzzy conditions instead of boolean conditions), in order to produce a ranked sorted list of answers according to user's preferences when composing queries. By using the FLOPER system designed in our research group, our proposal has been implemented with a fuzzy logic language to take profit of the clear sinergies between both target and source fuzzy languages.
Our approach firstly proposes two structural constraints called DOWN and DEEP for which a certain degree of relevance can be associated. So, whereas down provides a ranked set of answers depending on the path they are found from "top to down" in the XML document, DEEP provides a ranked set of answers depending on the path they are found from "left to right" in the XML document. Both structural constraints can be used together, assigning degree of importance with respect to the distance to the root XML element.
Secondly, we provide fuzzy variants of and and or for XPath conditions. Crisp and and or operators are used in standard XPath over boolean conditions, and enable to impose boolean requirements of the answers. XPath boolean conditions can be referred to attribute values and node content, in the form of equality and range of literal values, among others. Nevertheless, the and and or operators applied to two boolean conditions are not precise enough when the user does not give the same value to both conditions. For instance, some answers can be discarded when they could be of interest by the user, and accepted when they are not of interest. Besides, users would need to know in which sense a solution is better than another. When several boolean conditions are imposed in a query, each one contributes to satisfy the user's preferences in a different way and perhaps, the user's satisfaction is distinct for each solution.
We have enriched the arsenal of operators of XPath with fuzzy variants of and and or. Particularly, we have considered three versions of and: and+, and, and- (and the same for or : or+, or, or-) which make more flexible the composition of fuzzy conditions. Three versions for each operator that come for free from our adaptation of fuzzy logic to the XPath paradigm. One of the most known elements of fuzzy logic is the introduction of fuzzy versions of classical boolean operators. Product, Lukasiewicz and Gödel fuzzy logics are considered as the most prominent logics and give a suitable semantics to fuzzy operators. Our contribution is now to give sense to fuzzy operators into the XPath paradigm, and particularly in user's preferences. We claim that in our work the fuzzy versions provide a mechanism to force (and debilitate) conditions in the sense that stronger (and weaker) user preferences can be modeled with the use of stronger (and weaker) fuzzy conditions. The combination of fuzzy operators in queries permits to specify a ranked set of fuzzy conditions according to user's requirements.
Furthermore, we have equipped XPath with an additional operator that is also traditional in fuzzy logic: the average operator avg. This operator offers the possibility to explicitly give weight to fuzzy conditions. Rating such conditions by avg, solutions increase its weight in a proportional way. However, from the point view of the user's preferences, it forces the user to quantify his(er) wishes which, in some occasions, can be difficult to measure. For this reason, fuzzy versions of and and or are better choices in some circumstances.
Finally, we have equipped our XPath based query language with a mechanism for thresholding user's preferences, in such a way that user can request that requirements are satisfied over a certain percentage.