Foundations of a Logic based approach to Multimedia Document Retrieval.

PhD thesis, Department of Computer Science, University of Dortmund, June 1999.


Multimedia Information Retrieval (MIR), i.e. the retrieval of those multimedia objects of a collection which are relevant to a user information need, is an intensively investigated research area. It involves research from several fields of computer science, notably, Information Retrieval, Image Retrieval, Audio Retrieval, Video Retrieval, the Database community and Artificial Intelligence. This variety reveals that many different aspects are involved in MIR, each requiring a specific background and methodology, and that there may be different approaches not only within the same discipline, but also across different ones. A principled approach to the description of a MIR model requires the formal specification of three basic entities of retrieval: (i) the representation of multimedia objects; (ii) the representation (called query) of a user information need; and (iii) the retrieval function, returning a ranked list of objects for each information need. We believe that any MIR model should address the bidimensional aspect of multimedia objects: that is, their form and their semantics (or meaning). The form of an object is a collective name for all its media dependent features, whereas the semantics of an object is a collective name for those features that pertain to the slice of the real world being represented, which exists independently of the existence of a object referring to it. Unlike form, the semantics of an object is thus media independent. Corresponding to these two dimensions, there are three categories of retrieval: one for each dimension (form-based retrieval and semantics-based retrieval) and one concerning the combination of both of them. Form-based retrieval methods automatically create the object representations to be used in retrieval by extracting features from multimedia objects, such as the number of occurrences of words in text, colour distributions in images, and video frame sequences in videos. Semantics-based retrieval methods rely on a symbolic representation of the aboutness of multimedia objects, e.g. ``this image is about a girl". That is, descriptions formulated in some suitable formal language. User queries may thus address both dimensions, e.g. ``find images about girls wearing clothes with a texture like this''. In it, the texture addresses an image feature (form), whereas the aboutness addresses the meaning of an image (semantics). Despite the fact that several MIR models have been proposed, there has been little work done in proposing MIR models in which all three categories of retrieval are tackled in a principled way. Not surprisingly, promising models involve the so-called logic-based approach to information retrieval. This thesis is a contribution in this direction. Indeed, we will propose an object-oriented data model for representing medium dependent features of multimedia objects (form properties) and a four-valued fuzzy horn description logic for representing multimedia object's semantics and domain knowledge (medium independent features -semantic properties). In particular, the logic is characterised by (i) a description logic component which allows the representation of the structured objects (of interest) in the real world; (ii) a horn rule component which allows us to reason about structured objects; (iii) a non-classical, four-valued semantics which allows us to deal with possible inconsistencies arising from the representation of document semantics; (iv) a fuzzy component which allows for the treatment of the inherent imprecision in multimedia document representation and retrieval. Retrieval is then defined in terms of logical entailment, where the object-oriented data model hase been integrated within the logic. %Several new decision algorithms are developed in the thesis and have been implemented. The rational of the above choices relies on the fact that the principles of object-oriented design, namely aggregation, classification and generalisation, have been widely used in the context of multimedia object representation, revealing its appropriateness for representing medium depended features. In contrast, the components of the proposed logic have been thoroughly investigated and a wide range of results, automated reasoning techniques, and systems are available which makes the model a viable tool for practical use in the context of MIR. The main feature of the model is that all three above-mentioned categories of retrieval are addressed in a formal, flexible and extensible framework. The model allows us to represent both form properties and semantic properties of multimedia data, combining in a neat way different techniques --notably database techniques and semantic information processing (knowledge representation and reasoning), with the aim of developing intelligent multimedia retrieval systems.