MarcXimiL : near duplicates detection (and similarity analysis)

MarcXimiL is an open source tool which works on MARCXML records and calculates similarity indices between these records. After a short theoretical introduction, the tutorial will focus on how to install, parametrize and use the tool. This tool can be implemented in order to : * prevent creation of duplicates (similar records are shown during the validation process) * identify duplicates into batch files before ingest * find duplicates inside a collection * suggest to users similar records to the one found after a request * match related documents eg. preprints and articles * and so on.


Presented at:
CERN Workshop on Innovations in Scholarly Communication (OAI7), Geneva, June 22-24, 2011
Year:
2011
Note:
Tutorial session
Laboratories:




 Record created 2011-06-26, last modified 2018-01-28

External links:
Download fulltextURL
Download fulltextn/a
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)