A model of information retrieval based on a terminological logic.

In Proceedings of SIGIR-93, 16th International Conference on Research and Development in Information Retrieval, pages 298--307, Pittsburgh, PA, 1993.


Abstract:
According to recent research, the task of Information Retrieval (IR) can successfully be described as the extraction, from a given document base, of those documents d that, given a query q, make the formula d -> q valid, where d and q are formulae of the chosen logic and ``->'' denotes the brand of logical implication formalized by the logic in question. In this paper, although essentially subscribing to this view, we propose that the logic to be chosen for this endeavour be a Terminological Logic (TL): according to this view the IR task becomes that of singling out those documents d such that q subs d, where d and q are terms} of the chosen TL and ``subs'' denotes subsumption between terms. We argue that TLs are particularly suitable for modelling IR; we do this by showing that they can successfully be employed in representing documents under a variety of aspects (e.g. structural, layout, content), in representing queries and in representing domain and lexical knowledge. The fact that a single logical language can be used for all these representational endeavours ensures that all these sources of knowledge will participate in the retrieval process in a principled way. In this paper we introduce MIRTL, a TL for modelling IR according to the above guidelines; its syntax, formal semantics and inferential algorithm are described.