Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Optimization and historical contingency in protein sequences
 
journal article

Optimization and historical contingency in protein sequences

Bitbol, Anne-Florence  
February 8, 2024
Biophysical Journal

Protein sequences are shaped by functional optimization on the one hand and by evolutionary history, i.e. phylogeny, on the other hand. A multiple sequence alignment of homologous proteins contains sequences which evolved from the same ancestral sequence and have similar structure and function. In such an alignment, correlations in amino acid usage at different sites can arise from structural and functional constraints due to coevolution, but also from historical contingency. Correlations arising from phylogeny often confound coevolution signal from functional or structural optimization, impairing the inference of structural contacts from sequences. However, inferred Potts models are more robust than local statistics to these effects, which may explain their success. Dedicated corrections can further increase this robustness. Moreover, phylogenetic correlations can in fact provide useful information for some inference tasks, especially to infer interaction partners from sequences among the paralogs of two protein families. In this case, signal from phylogeny and signal from constraints combine constructively, and explicitly exploiting both further improves inference performance. Protein language models have recently been applied to sequence data, greatly advancing structure, function and mutational effect prediction. Language models trained on multiple sequence alignments capture coevolution and structural contacts, but also phylogenetic relationships. They are able to disentangle signal from structural constraints and from phylogeny more efficiently than Potts models, and they have promising generative properties. Furthermore, they allow predicting interacting partners from protein sequences, outperforming traditional coevolution methods on difficult datasets.

  • Details
  • Metrics
Type
journal article
DOI
10.1016/j.bpj.2023.11.344
Author(s)
Bitbol, Anne-Florence  

EPFL

Date Issued

2024-02-08

Publisher

Elsevier

Published in
Biophysical Journal
Volume

123

Issue

3

Start page

44a

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
UPBITBOL  
Available on Infoscience
December 24, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/257315
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés