Effectively managing a large collection of multimedia documents is a challenge, addressed by many disciplines from signal processing through database systems to artificial intelligence and interaction design. The problems to be solved have rarely been considered together. We propose a series of novel solutions for: the system architecture, the document content characterization, the retrieval methodology, and the user interaction schemes. We propose and describe a communicating components architecture using a new open and flexible protocol for messaging. Its foundation, the multimedia retrieval markup language (mrml) is specified with examples of the benefits of this approach. The general multimedia application domain is restricted mainly to image documents that carry semantic annotation like captions or meta-data. The integration of perceptual and semantic content descriptions into a single retrieval structure is the principal contribution of our work. The proposed method allows for improved effectiveness, augmented functionality and high flexibility for the extension to other media types like audio or video. The relationships among the two different representational characteristics can be exploited for more effective retrieval and other tasks like semi-automatic image annotation or illustration of semantic concepts. The final contribution is the study and implementation of user interaction metaphors that are intuitive and attempt to further bridge the gap between user's and system's notion of relevance. A rich set of information need formulation tools is proposed to users of varying skill and of varying requirements. Merging query specification with result visualization and browsing, using interactively explorable search spaces, offers a single access point to most retrieval tasks.