Solute energy based REMD: developments and applications to prion protein misfold predictions

Molecular dynamics (MD) simulations have increasingly contributed to the understanding of biomolecular processes, allowing for predictions of thermodynamic and structural properties. Unfortunately, the holy grail of protein structure prediction was soon found to be severely hampered by the very rugged free energy surface of proteins, with small relative free energies separating native, folded protein conformations from unfolded states. These multiple minima frequently trap present-day protein MD simulations permanently. In order to allow the simulation to escape minima and explore wider portions of conformational space, enhanced sampling techniques were developed. One of the most popular ones, replica exchange molecular dynamics (REMD), is based on multiple parallel MD simulations that are performed with replicas of a system at increasing temperatures T1, T2, etc. Periodic Monte Carlo exchange moves are attempted, aiming to allow conformations to exchange temperature ensembles with a probability that depends on their potential energy and temperature difference. Thus, conformations are simulated at all temperatures and escape local minima with the kinetic energy provided at higher temperatures, while Boltzmann distributions are generated at all temperatures. REMD has been successful in ab-initio folding of a variety of small peptides and proteins (up to 20-30 residues). However, with larger proteins, the overlap of potential energy distributions diminishes, since the potential energy and its fluctuation scale with fkBT, respectively with √fkBT, where f is the number of degrees of freedom of the system, kB Boltzmann's constant and T the temperature. Consequently, the related Monte Carlo exchange probability and number of exchanges in a simulation are also diminished. This is generally compensated by choosing smaller temperature intervals between replicas and thereby increasing the number of necessary replicas (as well as the computational cost of the simulation) to cover a given temperature range. An additional problem relates to explicit solvent simulations, in which solvent to solvent interactions account for the largest part of the total potential energy. Consequently, explicit solvent REMD simulations almost exclusively sample solvent degrees of freedom. These two limitations have lead to the development of REMD protocols for large explicit solvent systems that are based on exchange probabilities computed with subsystem (e.g. protein only) potential energy functions, allowing for a targeted sampling of protein degrees of freedom and a reduction of the computational effort. In this thesis, this approximation is tested by implementing its simplest variation that entirely neglects solvent-solvent interaction as a new REMD protocol termed REM Dpe (Chapter 2). Possible REMD limitations for large explicit solvent systems are tested (Chapter 3) with REM Dpe, which is further applied to perform a thorough and comparative investigation of prion (Chapter 4) and doppel (Chapter 5) protein misfolding. In Chapter 2, the practical validity of the REM Dpe approximation is assessed with simulations of the prion protein, a system that is too large (i.e. 103 residues) to allow for efficient simulations using traditional REMD over the necessary temperature range. A first validation consists in testing whether protein and total potential energy distributions are consistent with their analogs from straightforward reference MD simulations. Second, the overlap pattern of the total potential energy distributions are characterized at different temperatures and show that the exchanges in the REM Dpe simulations are performed according to a Boltzmann weight. Native structures are found to have the lowest protein and total potential energies, as compared to higher energies found for various unfolded structures. Although no obvious bias is detected in the three validations, the conformational landscapes of the REM Dpe simulation at low temperatures progressively shift to non-native regions of the free energy surface. In Chapter 3, this phenomenon is quantified, and its origin identified in insufficient low temperature residence times required for refolding native-like structures. REMD is based on the assumption that systems have to be decorrelated between exchange attempts. Increasing inter-exchange times accordingly would allow for decorrelation and sufficient low temperature residence times but is practically impossible to achieve for large protein simulations, highlighting a major limitation and possible source of bias for present-day REMD simulations. We have chosen the prion as a test case because of its link to transmissible spongiform encephalopathies. Diseases of this category are believed to be caused by a rare prion protein (PrP) misfold leading from the cellular, monomeric, soluble, α-helical PrPC isoform to a pathogenic, aggregated, insoluble, β-rich PrPSc isoform of unknown structure. Gaining experimental knowledge of the PrPSc structure has remained elusive, and aroused interest in predictions supplied by computer simulations. REMD provides a powerful tool allowing to explore a diversity of misfolds and select stable ones that accumulate at lower temperatures. In Chapter 4, we describe a PrP REM Dpe simulation in which rare new β-strands are formed and arrange into a multitude of different β-sheets, reproducing the α-helix → β-sheet conversion observed with circular dichroism spectra. The α-helical and β-sheet propensities along the sequence can thus be computed. We develop and apply the β contact map clustering (bcmc) protocol to identify the most frequent β-sheet pattern defining β-rich folds. 10 new β-rich folds are found and compared to recent experimental data characterizing PrPSc, providing atomistically detailed models for putative monomeric precursors of PrPSc or β-oligomeric conformations. In Chapter 5, an analogous simulation is performed with doppel, a structural homolog of prion (with an identical three α-helix, two β-strand fold) originating from the same gene family, but characterized by a different sequence (only 25% sequence homology), expression pattern and physiological function. Unrelated to amyloid neurodegenerative diseases, doppel supplies the perfect test system to investigate the misfolding of a non-amyloidogenic protein. Prion and doppel misfolding are compared in their monomeric form in the quest to identify prion-specific features that might reveal the mechanism of conversion to PrPSc. In agreement with experiments, we find a lower thermal stability for doppel. Surprisingly, we also observe β-rich forms for doppel. However, the β-rich folds of the two proteins are very different. Moreover, a major difference is found in the free energy barriers leading from the native structure to such conformations as well as to non-native conformations in general: These barriers are low for prion and can already be crossed at 300K, while for doppel they are at least 3 times higher. This difference suggests an intrinsic misfolding and β-enrichment propensity for the monomeric form of prion as compared to doppel.

Related material