Foundations of a Logic based approach to Multimedia Document
PhD thesis, Department of Computer Science, University
of Dortmund, June 1999.
Multimedia Information Retrieval (MIR), i.e. the retrieval of
those multimedia objects of a collection which are relevant to a user
information need, is an intensively investigated research
area. It involves research from several fields of computer science,
notably, Information Retrieval, Image Retrieval, Audio Retrieval,
Video Retrieval, the Database community and Artificial Intelligence.
This variety reveals that many different aspects are involved in MIR,
each requiring a specific background and methodology, and that there
may be different approaches not only within the same discipline, but
also across different ones.
A principled approach to the description of a MIR model requires the
formal specification of three basic entities of retrieval: (i) the
representation of multimedia objects; (ii) the representation
(called query) of a user information need; and (iii) the
retrieval function, returning a ranked list of objects for
each information need.
We believe that any MIR model should
address the bidimensional aspect of multimedia objects: that is, their
form and their semantics (or meaning).
The form of an object is a collective name for all its media
dependent features, whereas the semantics of an object is a
collective name for those features that pertain to the slice of the
real world being represented, which exists independently of the
existence of a object referring to it. Unlike form, the semantics of
an object is thus media independent.
Corresponding to these two dimensions, there are three categories of
retrieval: one for each dimension (form-based retrieval and
semantics-based retrieval) and one concerning the combination of both
of them. Form-based retrieval methods automatically create the object
representations to be used in retrieval by extracting features from
multimedia objects, such as the number of occurrences of words in
text, colour distributions in images, and video frame sequences in
videos. Semantics-based retrieval methods rely on a symbolic
representation of the aboutness of multimedia objects, e.g. ``this image
is about a girl". That is, descriptions formulated in some suitable
formal language. User queries may thus address both dimensions,
e.g. ``find images about girls wearing clothes with a texture like
this''. In it, the texture addresses an image feature (form), whereas
the aboutness addresses the meaning of an image (semantics).
Despite the fact that several MIR models have been proposed, there has
been little work done in proposing MIR models in which all three
categories of retrieval are tackled in a principled way. Not
surprisingly, promising models involve the so-called logic-based
approach to information retrieval. This thesis is a contribution in
this direction. Indeed, we will propose an object-oriented data
model for representing medium dependent features of multimedia
objects (form properties) and a four-valued fuzzy horn description logic for
representing multimedia object's semantics and domain knowledge
(medium independent features -semantic properties). In particular, the logic is
characterised by (i) a description logic component which allows the
representation of the structured objects (of interest) in the real
world; (ii) a horn rule component which allows us to reason about
structured objects; (iii) a non-classical, four-valued semantics
which allows us to deal with possible inconsistencies arising from the
representation of document semantics; (iv) a fuzzy component which
allows for the treatment of the inherent imprecision in multimedia
document representation and retrieval. Retrieval is then defined in
terms of logical entailment, where the object-oriented data model hase
been integrated within the logic.
%Several new decision algorithms are developed in the thesis and have been implemented.
The rational of the above choices relies on the fact that the
principles of object-oriented design, namely aggregation,
classification and generalisation, have been widely used in the
context of multimedia object representation, revealing its
appropriateness for representing medium depended features. In
contrast, the components of the proposed logic have been thoroughly
investigated and a wide range of results, automated reasoning
techniques, and systems are available which makes the model a viable
tool for practical use in the context of MIR.
The main feature of the model is that all three above-mentioned categories of retrieval are
addressed in a formal, flexible and extensible framework. The model allows us to represent
both form properties and semantic properties of multimedia data, combining in a neat way
different techniques --notably database techniques and semantic information processing
(knowledge representation and reasoning), with the aim of developing
intelligent multimedia retrieval systems.