Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Student works
  4. Development of a flexible tool for the automatic comparison of bibliographic records. Application to sample collections - Développement d'un logiciel flexible pour la comparaison de notices bibliographiques et application à différentes collections
 
master thesis

Development of a flexible tool for the automatic comparison of bibliographic records. Application to sample collections - Développement d'un logiciel flexible pour la comparaison de notices bibliographiques et application à différentes collections

Borel, Alain  orcid-logo
•
Krause, Jan
2009

Due to the multiplication of digital bibliographic catalogues (open repositories, library and bookseller catalogues), information specialists are facing the challenge of mass-processing huge amounts of metadata for various purposes. Among the many possible applications, determining the similarity between records is an important issue. Such a similarity can be interesting from a bibliographic point of view (i.e., do the records describe the same document, the answer to which can be useful for deduplication or for collection overlap studies) as well as from a thematic point of view (suggestion of documents to the user, as well as content management within the framework of a library policy, automatic classification of documents, and so on). In order to fulfil such various needs, we propose a flexible, open-source, multiplatform software tool supporting the implementation of multiple strategies for record comparisons. In a second step, we study the relevance and performance of several algorithms applied to a selection of collections (size, origin, document types...).

  • Files
  • Details
  • Metrics
Type
master thesis
Author(s)
Borel, Alain  orcid-logo
Krause, Jan
Advisors
Savoy, Jacques
Date Issued

2009

Subjects

Information retrieval

•

Bibliographic similarity and duplicate detection

•

MARC

•

XML

•

Python

Note

Diplôme universitaire de formation continue en information documentaire (CESID), Université de Genève

URL

URL

http://marcximil.sourceforge.net
Written at

EPFL

EPFL units
VPA-SISB  
Available on Infoscience
October 13, 2009
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/43674
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés