Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Impact of phylogeny on the inference of functional sectors from protein sequence data
 
research article

Impact of phylogeny on the inference of functional sectors from protein sequence data

Dietler, Nicola  
•
Abbara, Alia  
•
Choudhury, Subham  
Show more
September 1, 2024
PLoS Computational Biology

Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.

  • Details
  • Metrics
Type
research article
DOI
10.1371/journal.pcbi.1012091
Scopus ID

2-s2.0-85204680322

PubMed ID

39312591

Author(s)
Dietler, Nicola  

École Polytechnique Fédérale de Lausanne

Abbara, Alia  

École Polytechnique Fédérale de Lausanne

Choudhury, Subham  

École Polytechnique Fédérale de Lausanne

Bitbol, Anne Florence  

École Polytechnique Fédérale de Lausanne

Date Issued

2024-09-01

Published in
PLoS Computational Biology
Volume

20

Issue

9

Article Number

e101209

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
UPBITBOL  
LCSB  
FunderFunding(s)Grant NumberGrant URL

European Research Council

European Union’s Horizon 2020 research and innovation programme

851173

Available on Infoscience
January 24, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/243787
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés