Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Emergent semantics : rethinking interoperability for large scale decentralized information systems
 
doctoral thesis

Emergent semantics : rethinking interoperability for large scale decentralized information systems

Cudré-Mauroux, Philippe  
2007

In the past, the problem of semantic interoperability in information systems was mostly solved by means of centralization, both at a system and at a logical level. This approach has been successful to a certain extent, but offers limited scalability and flexibility. Peer-to-Peer systems as a new brand of system architectures indicate that the principles of decentralization and self-organization might offer new solutions to many problems that scale well to very large numbers of users, or to systems where central authorities do not prevail. Therefore, we suggest a new way of building global agreements, i.e., semantic interoperability, based on decentralized, self-organizing interactions only. In the first part of this thesis, we discuss traditional data integration techniques relying on global schemas, perfect schema mappings and contained query rewritings. We elaborate on the current ecology of the World Wide Web, where autonomous information sources come and go in dynamic and unpredictable ways. In the current environment, data, schemas and schema mappings can all be generated without human intervention and get encoded in syntactic structures with limited expressivity. We argue that traditional top-down integration techniques are inapplicable to that new context and propose a new integration architecture based on decentralized mappings and dynamic self-organization. In the second part of this thesis, we propose a set of principles to foster semantic interoperability in very large scale information systems. We start by introducing new metrics for the schema mappings, based on both syntactic losses (completeness) and semantic mismatch (soundness) to selectively reformulate queries in a decentralized network of heterogeneous parties. We detail analytical methods to evaluate our metrics, and show how to take advantage of those methods to gradually alleviate mapping inconsistencies across the network. We describe a totally decentralized message passing scheme using belief propagation on transitive closures of schema mapping operations to efficiently evaluate the degree of semantic mismatch between pairs of acquainted information systems. Finally, we propose a graph-theoretic analysis of the network of mappings to quantify the quality of the global agreement that can be achieved in that way. The third and last part of this thesis is devoted to the presentation of two systems illustrating the practical applicability of our ideas. The first system we introduce, GridVine, is a Semantic Overlay Network supporting decentralized data integration techniques through pairwise schema mappings and monotonic schema inheritance. GridVine follows the principle of data independence by separating a logical layer, the semantic overlay for managing and mapping data and schemas, from a physical layer consisting of a self-organizing Peer-to-Peer overlay network for efficient routing of messages. The second system, called PicShark, takes advantage of semi-structured metadata to meaningfully share pictures in collaborative settings. PicShark builds on our principles to dynamically create both annotations and mappings, and to gradually minimize information entropy – in terms of missing metadata and schematic heterogeneity – in a self-organizing and decentralized context. Throughout this thesis, we advocate a holistic view on semantics in large-scale information systems: we model semantics as bottom-up and dynamic agreements among heterogeneous parties. We consider both the representation of semantics and the discovery of the interpretation of symbols as the result of a self-organizing process performed by distributed agents whose utility functions depend on the proper interpretation of the symbols. Our view sharply contrasts with previous top-down contributions analyzing data sources in isolation or focusing on global vocabularies and rigid sets of interpretations curated off-line. In a world where digital information is abundant but human attention remains scarce, we believe that autonomous, best-effort processes such as the ones proposed throughout this thesis will play an ever increasing role in complementing traditional top-down integration approaches to handle massive amounts of digitalized and heterogeneous information assets.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-3690
Author(s)
Cudré-Mauroux, Philippe  
Advisors
Aberer, Karl  
Jury

Avigdor Gal, Monika Henzinger, Zachary Ives

Date Issued

2007

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2006-12-01

Thesis number

3690

Total of pages

215

Subjects

heterogeneous databases

•

semantic interoperability

•

peer data management

•

bases de données hétérogènes

•

interopérabilité sémantique

•

gestion des données pair-à-pair

Note

Published as Emergent semantics: interoperability in large-scale decentralized information systems (EPFL Press, 2008, ISBN 978-1-4200-9227-1)

URL

Award

http://vpaa.epfl.ch/page14975-fr.html
EPFL units
LSIR  
Faculty
IC  
Section
IC-SSC  
School
IIF  
Award

EPFL Doctorate Award

2007
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/235414
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés