Handling probabilistic integrity constraints in pay-as-you-go reconciliation of data models

Nguyen Quoc Viet Hung; Weidlich, Matthias; Nguyen Thanh Tam; Miklos, Zoltan; Aberer, Karl; Gal, Avigdor; Stantic, Bela

doi:10.1016/j.is.2019.04.002

research article

Handling probabilistic integrity constraints in pay-as-you-go reconciliation of data models

Nguyen Quoc Viet Hung

•

Weidlich, Matthias

•

Nguyen Thanh Tam

July 1, 2019

Information Systems

Data models capture the structure and characteristic properties of data entities, e.g., in terms of a database schema or an ontology. They are the backbone of diverse applications, reaching from information integration, through peer-to-peer systems and electronic commerce to social networking. Many of these applications involve models of diverse data sources. Effective utilisation and evolution of data models, therefore, calls for matching techniques that generate correspondences between their elements. Various such matching tools have been developed in the past. Yet, their results are often incomplete or erroneous, and thus need to be reconciled, i.e., validated by an expert. This paper analyses the reconciliation process in the presence of large collections of data models, where the network induced by generated correspondences shall meet consistency expectations in terms of integrity constraints. We specifically focus on how to handle data models that show some internal structure and potentially differ in terms of their assumed level of abstraction. We argue that such a setting calls for a probabilistic model of integrity constraints, for which satisfaction is preferred, but not required. In this work, we present a model for probabilistic constraints that enables reasoning on the correctness of individual correspondences within a network of data models, in order to guide an expert in the validation process. To support pay-as-you-go reconciliation, we also show how to construct a set of high-quality correspondences, even if an expert validates only a subset of all generated correspondences. We demonstrate the efficiency of our techniques for real-world datasets comprising database schemas and ontologies from various application domains. (C) 2019 Elsevier Ltd. All rights reserved.

Type

research article

DOI

10.1016/j.is.2019.04.002

Web of Science ID

WOS:000469906000011

Author(s)

Nguyen Quoc Viet Hung

Weidlich, Matthias

Nguyen Thanh Tam

Miklos, Zoltan

Aberer, Karl

Gal, Avigdor

Stantic, Bela

Date Issued

2019-07-01

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Published in

Information Systems

Volume

83

Start page

166

End page

180

Subjects

Computer Science, Information Systems

•

Computer Science

•

data integration

•

probabilistic constraints

•

model reconciliation

•

schema

•

patterns

•

web

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

LSIR

Available on Infoscience

June 18, 2019

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/157826