MarcXimiL : near duplicates detection (and similarity analysis)
MarcXimiL is an open source tool which works on MARCXML records and calculates similarity indices between these records. After a short theoretical introduction, the tutorial will focus on how to install, parametrize and use the tool. This tool can be implemented in order to : * prevent creation of duplicates (similar records are shown during the validation process) * identify duplicates into batch files before ingest * find duplicates inside a collection * suggest to users similar records to the one found after a request * match related documents eg. preprints and articles * and so on.
Record created on 2011-06-26, modified on 2016-08-09