Dal Peraro, MatteoKrapp, Lucien Fabrice2023-10-302023-10-302023-10-30202310.5075/epfl-thesis-10309https://infoscience.epfl.ch/handle/20.500.14299/201949Proteins, the central building blocks of life, play pivotal roles in nearly every biological function. To do so, these macromolecular structures interact with their surrounding environment in complex ways, leading to diverse functional behaviors. The prediction of these interactions, especially those involving protein-protein interfaces and other molecular interactions, has long been a major challenge in the field of structural biology. However, with the recent surge in advanced computational methods, we are now on the brink of making significant breakthroughs. We developed the Protein Structure Transformer (PeSTo), a deep learning method that leverages a novel operation called geometric transformers. PeSTo only requires as input the atomic coordinates and element names of the structure. This general approach allows the model to be applied to many different tasks without requiring any computationally expensive data processing. The method demonstrated an impressive performance in accurately predicting the protein-protein binding interfaces, outperforming the state-of-the-art methods. We extended PeSTo to predict protein binding interfaces in general, detecting and distinguishing protein interfaces with nucleic acids, ligands, ions and lipids. The defining advantages of PeSTo are its low computational cost and robustness. Unlike many existing tools, PeSTo allows for high-throughput processing of structural data, including molecular dynamics ensembles. This ability to process large amounts of data efficiently enabled us to predict binding interfaces for all AlphaFold predicted structures. This ensemble of binding interfaces, which we call the "interfaceome", has the potential to help the identification of protein binding domains and accelerate research. Beyond protein interacting interface prediction, PeSTo has been applied to another challenging problem in protein design: the prediction of protein sequences from backbone scaffolds. The newly trained model, called CARBonAra (Context-aware Amino acid Recovery from Backbone Atoms and heteroatoms), performs on par with the state-of-the-art methods for the in-silico sequence recovery rate. Unlike other methods, CARBonAra is able to predict amino acid sequences from a backbone scaffold with other non-protein atoms such as nucleic acids and ligands. This ability to consider non-protein entities in the design of protein sequences opens a myriad of possibilities, including the design of proteins that can interact with specific molecules, such as nucleic acids, leading to potential applications in therapeutics and biotechnology. In conclusion, the development of PeSTo represents a significant leap forward in the application of deep learning in structural biology. It not only provides an efficient and accurate tool for predicting protein interactions, but also opens a new frontier in protein design considering non-protein entities. By leveraging the rapidly expanding protein structure data, PeSTo holds vast potential for a broad spectrum of applications in structural biology and material science.enstructural biologydeep learningprotein-protein interactionsprotein binding interfacesprotein designinverse folding problemgeometric transformersA Geometric Transformer for Structural Biology: Development and Applications of the Protein Structure Transformerthesis::doctoral thesis