Semantic validation in spatio-temporal schema integration

This thesis proposes to address the well-know database integration problem with a new method that combines functionality from database conceptual modeling techniques with functionality from logic-based reasoners. We elaborate on a hybrid - modeling+validation - integration approach for spatio-temporal information integration on the schema level. The modeling part of our methodology is supported by the spatio-temporal conceptual model MADS, whereas the validation part of the integration process is delegated to the description logics validation services. We therefore adhere to the principle that, rather than extending either formalism to try to cover all desirable functionality, a hybrid system, where the database component and the logic component would cooperate, each one performing the tasks for which it is best suited, is a viable solution for semantically rich information management. First, we develop a MADS-based flexible integration approach where the integrated schema designer has several viable ways to construct a final integrated schema. For different related schema elements we provide the designer with four general policies and with a set of structural solutions or structural patterns within each policy. To always guarantee an integrated solution, we provide for a preservation policy with multi-representation structural pattern. To state the inter-schema mappings, we elaborate on a correspondence language with explicit spatial and temporal operators. Thus, our correspondence language has three facets: structural, spatial, and temporal, allowing to relate the thematic representation as well as the spatial and temporal features. With the inter-schema mappings, the designer can state correspondences between related populations, and define the conditions that rule the matching at the instance level. These matching rules can then be used in query rewriting procedures or to match the instances within the data integration process. We associate a set of putative structural patterns to each type of population correspondence, providing a designer with a patterns' selection for flexible integrated schema construction. Second, we enhance our integration method by employing validation services of the description logic formalism. It is not guaranteed that the designer can state all the inter-schema mappings manually, and that they are all correct. We add the validation phase to ensure validity and completeness of the inter-schema mappings set. Inter-schema mappings cannot be validated autonomously, i.e., they are validated against the data model and the schemas they link. Thus, to implement our validation approach, we translate the data model, the source schemas and the inter-schema mappings into a description logic formalism, preserving the spatial and temporal semantics of the MADS data model. Thus, our modeling approach in description logic insures that the model designer will correctly define spatial and temporal schema elements and inter-schema mappings. The added value of the complete translation (i.e., including the data model and the source schemas) is that we validate not only the inter-schema mappings, but also the compliance of the source schemas to the data model, and infer implicit relationships within them. As the result of the validation procedure, the schema designer obtains the complete and valid set of inter-schema mappings and a set of valid (flexible) schematic patterns to apply to construct an integrated schema that meets application requirements. To further our work, we model a framework in which a schema designer is able to follow our integration method and realize the schema integration task in an assisted way. We design two models, UML and SEAM models, of a system that provides for integration functionalities. The models describe a framework where several tools are employed together, each involved in the service it is best suited for. We define the functionalities and the cooperation between the composing elements of the framework and detail the logics of the integration process in an UML activity diagram and in a SEAM operation model.

    Thèse École polytechnique fédérale de Lausanne EPFL, n° 3423 (2006)
    Section d'informatique
    Faculté informatique et communications
    Institut d'informatique fondamentale
    Laboratoire de bases de données
    Jury: Alessandro Artale, Nadine Cullot, Emre Telatar, Alain Wegmann

    Public defense: 2006-1-27


    Record created on 2005-11-28, modified on 2016-08-08

Related material