Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Combining phylogeny and coevolution improves the inference of interaction partners among paralogous proteins
 
research article

Combining phylogeny and coevolution improves the inference of interaction partners among paralogous proteins

Gandarilla-Perez, Carlos
•
Pinilla, Sergio
•
Bitbol, Anne-Florence  
Show more
March 1, 2023
Plos Computational Biology

Author summaryWhen two protein families interact, their sequences feature statistical dependencies. First, interacting proteins tend to share a common evolutionary history. Second, maintaining structure and interactions through the course of evolution yields coevolution, detectable via correlations in the amino-acid usage at contacting sites. Both signals can be used to computationally predict which proteins are specific interaction partners among the paralogs of two interacting protein families, starting just from their sequences. We show that combining them improves the performance of interaction partner inference, especially when the average number of potential partners is large and when the total data set size is modest. The resulting paired multiple-sequence alignments might be used as input to machine-learning algorithms to improve protein-complex structure prediction, as well as to understand interaction specificity in signaling pathways.

Predicting protein-protein interactions from sequences is an important goal of computational biology. Various sources of information can be used to this end. Starting from the sequences of two interacting protein families, one can use phylogeny or residue coevolution to infer which paralogs are specific interaction partners within each species. We show that these two signals can be combined to improve the performance of the inference of interaction partners among paralogs. For this, we first align the sequence-similarity graphs of the two families through simulated annealing, yielding a robust partial pairing. We next use this partial pairing to seed a coevolution-based iterative pairing algorithm. This combined method improves performance over either separate method. The improvement obtained is striking in the difficult cases where the average number of paralogs per species is large or where the total number of sequences is modest.

  • Details
  • Metrics
Type
research article
DOI
10.1371/journal.pcbi.1011010
Web of Science ID

WOS:000961862600002

Author(s)
Gandarilla-Perez, Carlos
Pinilla, Sergio
Bitbol, Anne-Florence  
Weigt, Martin
Date Issued

2023-03-01

Publisher

PUBLIC LIBRARY SCIENCE

Published in
Plos Computational Biology
Volume

19

Issue

3

Article Number

e1011010

Subjects

Biochemical Research Methods

•

Mathematical & Computational Biology

•

Biochemistry & Molecular Biology

•

contacts

•

residue

•

identification

•

prediction

•

trees

•

cbl

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
UPBITBOL  
Available on Infoscience
May 8, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/197501
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés