Schema matching for structured document transformations

This dissertation studies structured document content reuse problem. In structured document content reuse, a document (or a part of document) structured under one schema must be restructured and translated into an instance of a different schema. Thus, a notion tied to structured document reuse problem is that of structure transformations. This is typically attained in real world by writing translators encoded on a case-by-case basis using specific transformation languages. Writing and managing complex transformations programs is time consuming and generally requires programming skills. Many solutions to simplify and automate as much as possible the task of structured document transformation specification and execution have been proposed. Several simpler and highly declarative transformation languages and graphical tools for transformation specifications have been introduced as solutions to avoid programming. These languages and tools are very useful in describing and specifying transformations. However, they still require developers to manually indicate mappings for each source and target pair. The manual generation of mappings is an extremely labor-intensive and error-prone process. To shield users from manually performing this task, we advocate the use of a schema matching process which (semi) automatically finds semantic correspondences, so called mappings, between two heterogeneous schemas. Based on such mappings a transformation generator generates automatically the translation program. In this dissertation, we propose a framework for solving XML schema matching problem. We rely on the extraction of semantic information nested within XML structures. Semantics is first captured by making explicit element names meanings, exploiting several element characteristics including the analysis of XML Schema's designer point of view expressed by the logical organisation of XML content and additional semantic information given by means of features such as datatypes, element constraints and inheritance mechanisms. The proposed framework provides a generic view and formalizes the overall matching process. Our proposed solutions allows to (1) discover semi-automatically an efficient sequence of operations for transforming a source XML schema into a target XML schema using schema matching techniques (2) model discovered mappings and (3) generate automatically the transformation script. Note that the purpose of our work is not to replace the programming languages for XML data translations, but rather to complement them. Experiments we have conducted show that we have been able to achieve good performance for the generation of source-to-target mappings.

    Thèse École polytechnique fédérale de Lausanne EPFL, n° 3108 (2004)
    Section d'informatique
    Faculté informatique et communications
    Institut des systèmes informatiques et multimédias
    Jury: Gilles Falquet, Roger Hersch, Vincent Quint, Martin Rajman

    Public defense: 2004-10-29


    Record created on 2005-03-16, modified on 2016-08-08


Related material


EPFL authors