Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Impact of phylogeny on structural contact inference from protein sequence data
 
research article

Impact of phylogeny on structural contact inference from protein sequence data

Dietler, Nicola  
•
Lupo, Umberto  
•
Bitbol, Anne-Florence  
February 8, 2023
Journal Of The Royal Society Interface

Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference.

  • Details
  • Metrics
Type
research article
DOI
10.1098/rsif.2022.0707
Web of Science ID

WOS:000934228400007

Author(s)
Dietler, Nicola  
Lupo, Umberto  
Bitbol, Anne-Florence  
Date Issued

2023-02-08

Publisher

ROYAL SOC

Published in
Journal Of The Royal Society Interface
Volume

20

Issue

199

Article Number

20220707

Subjects

Multidisciplinary Sciences

•

Science & Technology - Other Topics

•

protein sequences

•

inference

•

contact prediction

•

phylogeny

•

modelling

•

data analysis

•

direct-coupling analysis

•

correlated mutations

•

residue contacts

•

information

•

capture

•

model

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
UPBITBOL  
Available on Infoscience
March 27, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/196453
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés