Fine-tuning protein language models to understand the functional impact of missense variants
Elucidating the functional effects of missense variants is crucial yet challenging. To investigate their impact, we fine-tuned protein language models, including ESM2 and ProtT5, to classify 20 protein features at amino acid resolution. In addition, we trained a fully connected neural network classifier on frozen embeddings and compared its performance to fine-tuning in order to quantify the added value of task-specific adaptation. We then used the fine-tuned models to: 1) identify protein features enriched in either pathogenic or benign missense variants, and 2) compare the predicted feature profiles of proteins with reference and alternate alleles to understand how missense variants affect protein functionality. We show that our models can be used to reclassify variants of uncertain significance and provide mechanistic insights into the functional consequences of missense mutations.
10.1016_j.csbj.2025.05.022.pdf
Main Document
Published version
openaccess
CC BY
2.35 MB
Adobe PDF
01a10ad2f0f26d81e039a971a8fb5fde