Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics
 
research article

rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics

Bonet, Jaume  
•
Harteveld, Zander  
•
Sesterhenn, Fabian  
Show more
May 15, 2019
BMC Bioinformatics

BackgroundLarge-scale datasets of protein structures and sequences are becoming ubiquitous in many domains of biological research. Experimental approaches and computational modelling methods are generating biological data at an unprecedented rate. The detailed analysis of structure-sequence relationships is critical to unveil governing principles of protein folding, stability and function. Computational protein design (CPD) has emerged as an important structure-based approach to engineer proteins for novel functions. Generally, CPD workflows rely on the generation of large numbers of structural models to search for the optimal structure-sequence configurations. As such, an important step of the CPD process is the selection of a small subset of sequences to be experimentally characterized. Given the limitations of current CPD scoring functions, multi-step design protocols and elaborated analysis of the decoy populations have become essential for the selection of sequences for experimental characterization and the success of CPD strategies.ResultsHere, we present the rstoolbox, a Python library for the analysis of large-scale structural data tailored for CPD applications. rstoolbox is oriented towards both CPD software users and developers, being easily integrated in analysis workflows. For users, it offers the ability to profile and select decoy sets, which may guide multi-step design protocols or for follow-up experimental characterization. rstoolbox provides intuitive solutions for the visualization of large sequence/structure datasets (e.g. logo plots and heatmaps) and facilitates the analysis of experimental data obtained through traditional biochemical techniques (e.g. circular dichroism and surface plasmon resonance) and high-throughput sequencing. For CPD software developers, it provides a framework to easily benchmark and compare different CPD approaches. Here, we showcase the rstoolbox in both types of applications.Conclusionsrstoolbox is a library for the evaluation of protein structures datasets tailored for CPD data. It provides interactive access through seamless integration with IPython, while still being suitable for high-performance computing. In addition to its functionalities for data analysis and graphical representation, the inclusion of rstoolbox in protein design pipelines will allow to easily standardize the selection of design candidates, as well as, to improve the overall reproducibility and robustness of CPD selection processes.

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1186/s12859-019-2796-3
Web of Science ID

WOS:000468042200004

Author(s)
Bonet, Jaume  
Harteveld, Zander  
Sesterhenn, Fabian  
Scheck, Andreas  
Correia, Bruno E.  
Date Issued

2019-05-15

Published in
BMC Bioinformatics
Volume

20

Issue

240

Start page

240

Subjects

Biochemical Research Methods

•

Biotechnology & Applied Microbiology

•

Mathematical & Computational Biology

•

Biochemistry & Molecular Biology

•

Biotechnology & Applied Microbiology

•

Mathematical & Computational Biology

•

rstoolbox

•

computational protein design

•

protein structural metrics

•

scoring

•

data analysis

•

fold

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LPDI  
Available on Infoscience
June 18, 2019
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/157258
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés