Database Integration: an Overview of Issues and Approaches
In many large companies the widespread usage of computers has led a number of different application-specific databases to be installed. As company structures evolve, boundaries between departments move, creating new business units. Their new applications will use existing data from various data stores, rather than new data entering the organization. Henceforth, the ability to make data stores interoperable becomes a crucial factor for the development of new information systems. Data interoperability may come in various degrees. At the lowest level, commercial gateways connect specific pairs of database management systems (DBMSs). Software providing facilities for defining persistent views over different databases  simplifies access to distant data but does not support automatic enforcement of consistency constraints among different databases. Full interoperability is achieved by distributed or federated database systems, which support integration of existing data into virtual databases (i.e. databases which are logically defined but not physically materialized). The latter allow existing databases to remain under control of their respective owners, thus supporting a harmonious coexistence of scalable data integration and site autonomy requirements . Federated systems are very popular today. However, before they become marketable, many issues remain to be solved. Design issues focus on either human-centered aspects (cooperative work, including autonomy issues and negotiation procedures) or database-centered aspects (data integration, schema/database evolution). Operational issues investigate system interoperability mainly in terms of support of new transaction types, new query processing algorithms, security concerns, etc. General overviews may be found elsewhere [4, 9]. This paper is devoted to database integration, possibly the most critical issue. Simply stated, database integration is the process which takes as input a set of databases, and produces as output a single unified description of the input schemas (the integrated schema) and the associated mapping information supporting integrated access to existing data through the integrated schema. As such, database integration is also used in the process of re-engineering an exist i ng l egacy system. Database integration has attracted many diverse and diverging contributions. The purpose, and the main intended contribution of this article is to provide a clear picture of what are the approaches and the current solutions and what remains to be achieved.