Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. ProtMamba: a homology-aware but alignment-free protein state space model
 
research article

ProtMamba: a homology-aware but alignment-free protein state space model

Sgarbossa, Damiano  
•
Malbranke, Cyril  
•
Bitbol, Anne-Florence  
June 13, 2025
Bioinformatics

Motivation Protein language models are enabling advances in elucidating the sequence-to-function mapping, and have important applications in protein design. Models based on multiple sequence alignments efficiently capture the evolutionary information in homologous protein sequences, but multiple sequence alignment construction is imperfect. Results We present ProtMamba, a homology-aware but alignment-free protein language model based on the Mamba architecture. In contrast with attention-based models, ProtMamba efficiently handles very long context, comprising hundreds of protein sequences. It is also computationally efficient. We train ProtMamba on a large dataset of concatenated homologous sequences, using two GPUs. We combine autoregressive modeling and masked language modeling through a fill-in-the-middle training objective. This makes the model adapted to various protein design applications. We demonstrate ProtMamba’s usefulness for sequence generation, motif inpainting, fitness prediction, and modeling intrinsically disordered regions. For homolog-conditioned sequence generation, ProtMamba outperforms state-of-the-art models. ProtMamba’s competitive performance, despite its relatively small size, sheds light on the importance of long-context conditioning. Availability A Python implementation of ProtMamba is freely available in our GitHub repository: https://github.com/Bitbol-Lab/ProtMamba-ssm and archived at https://doi.org/10.5281/zenodo.15584634. Supplementary Information Supplementary data are available at Bioinformatics online.

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1093/bioinformatics/btaf348
Author(s)
Sgarbossa, Damiano  

École Polytechnique Fédérale de Lausanne

Malbranke, Cyril  

École Polytechnique Fédérale de Lausanne

Bitbol, Anne-Florence  

École Polytechnique Fédérale de Lausanne

Date Issued

2025-06-13

Publisher

Oxford University Press (OUP)

Published in
Bioinformatics
Subjects

Protein sequences

•

homologous proteins

•

protein language model

•

protein fitness prediction

•

protein design

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
UPBITBOL  
Available on Infoscience
June 16, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/251374
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés