Tandem Mass Spectrometry Protein Identification on a PC Grid

We present a method to grid-enable tandem mass spectrometry protein identification. The implemented parallelization strategy embeds the open-source x!tandem tool in a grid-enabled workflow. This allows rapid analysis of largescale mass spectrometry experiments on existing heterogeneous hardware. We have explored different data-splitting schemes, considering both splitting spectra datasets and protein databases, and examine the impact of the different schemes on scoring and computation time. While resulting peptide e-values exhibit fluctuation, we show that these variations are small, caused by statistical rather than numerical instability, and are not specific to the grid environment. The correlation coefficient of results obtained on a standalone machine versus the grid environment is found to be better than 0.933 for spectra and 0.984 for protein identification, demonstrating the validity of our approach. Finally, we examine the effect of different splitting schemes of spectra and protein data on CPU time and overall wall clock time, revealing that judicious splitting of both data sets yields best overall performance.

Publié dans:
From Genes to Personalized HealthCare: Grid Solutions for the Life Sciences - Proceedings of HealthGrid 2007, 126, 3-12
Présenté à:
5th HealthGrid Conference, Geneva, Switzerland, 22.-26.4.2007
Netherlands, IOS Press

 Notice créée le 2007-03-07, modifiée le 2019-03-16

Télécharger le documentPDF
Lien externe:
Télécharger le documentURL
Évaluer ce document:

Rate this document:
(Pas encore évalué)