Proteome Digestion Specificity Analysis for Rational Design of Extended Bottom-up and Middle-down Proteomics Experiments
Mass spectrometry (MS)-based bottom-up proteomics (BUP) is currently the method of choice for large-scale identification and characterization of proteins present in complex samples, such as cell lysates, body fluids, or tissues. Technically, BUP relies on MS analysis of complex mixtures of small, <3 kDa, peptides resulting from whole proteome digestion. Because of the extremely high sample complexity, further developments of detection methods and sample preparation techniques are necessary. In recent years, a number of alternative approaches such as middle-down proteomics (MDP, addressing up to 15 kDa peptides) and top-down proteomics (TDP, addressing proteins exceeding 15 kDa) have been gaining particular interest. Here we report on the bioinformatics study of both common and less frequently employed digestion procedures for complex protein mixtures specifically targeting the MDP approach. The aim of this study was to maximize the yield of protein structure information from MS data by optimizing peptide size distribution and sequence specificity. We classified peptides into four categories based on molecular weight: 0.6-3 (classical BUP), 3-7 (extended BUP), 7-15 kDa (MDP), and >15 kDa (TDP). Because of instrumentation-related considerations, we first advocate for the extended BUP approach as the potential near-future improvement of BUP. Therefore, we chose to optimize the number of unique peptides in the 3-7 kDa range while maximizing the number of represented proteins. The present study considers human, yeast, and bacterial proteomes. Results of the study can be further used for designing extended BUP or MDP experimental workflows.