Designing effective retrieval engines for multimedia document repositories.

In AI*IA 1996 Workshop on "Access, Extraction and Integration of Knowledge, Napoli, Italy, 1996.


Introduction:
A principled approach to the design of a retrieval engine for multimedia document (MD) repositories starts from the identification of a suitable retrieval model, i.e., of a formal specification of the three basic entities of retrieval: documents, users' information needs, and the matching function, which assigns a set of documents to each information need. Modelling the retrieval of MDs requires taking into account (at least) the following three orthogonal dimensions: form, i.e. the structural component of a document; content, i.e. the meaning of a document; uncertainty, i.e. the imprecision in the system's estimation of the relevance of a document to a user information need. When matching documents at the form level, uncertainty affects the system's evaluation of the structural similarity between documents and queries; at the content level, it affects the system's evaluation of the overlap between their information content. A multimedia retrieval model should thus involve a combination of concepts and techniques from the world of digital signal processing, which contributes notions for representing the form of documents, and algorithms for assessing the similarity between them and the world of symbolic processing, which contributes conceptual models of reality for representing document contents (and related knowledge), and algorithms for reasoning about them in a way that captures relevance. A strong interaction between these two souls is a crucial factor for the adequacy of a multimedia retrieval model. Sadly, this is largely unaccomplished nowadays: current image retrieval models, for instance, either decidedly adhere to one of these two paradigms or can be decomposed into two independent sub-models each belonging to either paradigm. In order to accomplish a full integration, the signal (form) and symbolic (content) dimensions need to be put in relation with each other by the model, so that features pertaining to form can be addressed from within the same expressions used to address document content. In the rest of this paper we will briefly illustrate fragments of a model which we are incrementally developing, and which tries to capture these aspects in a unified, well-founded framework.