Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Automating Data Imports in a DSpace-CRIS’s Institutional Repository
 
conference presentation

Automating Data Imports in a DSpace-CRIS’s Institutional Repository

Rodrigues de Matos, Jorge  orcid-logo
•
Sicot, Julien  orcid-logo
June 17, 2025
The 20th International Conference on Open Repositories

The migration of Infoscience, EPFL’s institutional repository, to DSpace-CRIS required a custom Python-based pipeline to automate the ingestion of research outputs and datasets. Limitations in default DSpace-CRIS import tools, such as insufficient query controls, incomplete metadata mappings, and a lack of deduplication mechanisms, necessitated a tailored approach.

The pipeline leverages the DSpace REST API to enable precise queries, metadata reconciliation, and robust deduplication. It incorporates fallback mechanisms, such as publisher-specific APIs, for full-text retrieval when standard tools like Unpaywall and CrossRef prove insufficient. Key challenges included reconciling authorship with EPFL directories, aligning metadata across diverse collections, and maintaining data consistency during imports.

The developer track presentation will provide a visual breakdown of the pipeline’s architecture, highlight key challenges, and illustrate the solutions implemented. The presentation will complement this by delving deeper into the technical details and lessons learned. Both formats will offer practical insights for repository managers and developers seeking to automate data imports and optimize workflows in institutional repositories.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_Infoscience-Imports_OR2025_with_videos.pptx

Type

Presentation

Version

Not Applicable (or Unknown)

Access type

openaccess

License Condition

CC BY

Size

314.44 MB

Format

Microsoft Powerpoint XML

Checksum (MD5)

8877efbbf64809e4af359c7d5b6a1996

Loading...
Thumbnail Image
Name

EPFL_Infoscience-Imports_OR2025.pdf

Type

Main Document

Version

Not Applicable (or Unknown)

Access type

openaccess

License Condition

CC BY

Size

2.99 MB

Format

Adobe PDF

Checksum (MD5)

a5503a1349e798003c370349d07927f7

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés