This lecture introduces systematically into the problem of managing large data collections in peer-to-peer systems. Search over large datasets has always been a key problem in peer-to-peer systems and the peer-to-peer paradigm has incited novel directions in the field of data management. This resulted in many novel peer-to-peer data management concepts and algorithms, for supporting data management tasks in a wider sense, including data integration, document management and text retrieval. The lecture covers four different types of peer-to-peer data management systems that are characterized by the type of data they manage and the search capabilities they support. The first type are structured peer-to-peer data management systems which support structured query capabilities for standard data models. The second type are peer-to-peer data integration systems for querying of heterogeneous databases without requiring a common global schema. The third type are peer-to-peer document retrieval systems that enable document search based both on the textual content and the document structure. Finally, we introduce semantic overlay networks, which support similarity search on information represented in hierarchically organized and multi-dimensional semantic spaces. Topics that go beyond data representation and search are summarized at the end of the lecture.
978-3-031-00719-4
2011
Synthesis Lectures on Data Management; Vol. 3, No. 2, Pages 1-150