000105065 001__ 105065
000105065 005__ 20190316234015.0
000105065 0247_ $$2doi$$a10.1186/1471-2105-8-S4-S10
000105065 022__ $$a1471-2105
000105065 02470 $$2DAR$$a10923
000105065 02470 $$2ISI$$a000247557800010
000105065 037__ $$aARTICLE
000105065 245__ $$aClustering protein environments for function prediction: finding PROSITE motifs in 3D
000105065 269__ $$a2007
000105065 260__ $$bBioMed Central$$c2007
000105065 336__ $$aJournal Articles
000105065 520__ $$aBackground: Structural genomics initiatives are producing increasing numbers of three-dimensional (3D) structures for which there is little functional information. Structure-based annotation of molecular function is therefore becoming critical. We previously presented FEATURE, a method for describing microenvironments around functional sites in proteins. However, FEATURE uses supervised machine learning and so is limited to building models for sites of known importance and location. We hypothesized that there are a large number of sites in proteins that are associated with function that have not yet been recognized. Toward that end, we have developed a method for clustering protein microenvironments in order to evaluate the potential for discovering novel sites that have not been previously identified. Results: We have prototyped a computational method for rapid clustering of millions of microenvironments in order to discover residues whose surrounding environments are similar and which may therefore share a functional or structural role. We clustered nearly 2,000,000 environments from 9,600 protein chains and defined 4,550 clusters. As a preliminary validation, we asked whether known 3D environments associated with PROSITE motifs were "rediscovered". We found examples of clusters highly enriched for residues that share PROSITE sequence motifs. Conclusion: Our results demonstrate that we can cluster protein environments successfully using a simplified representation and K-means clustering algorithm. The rediscovery of known 3D motifs allows us to calibrate the size and intercluster distances that characterize useful clusters. This information will then allow us to find new clusters with similar characteristics that represent novel structural or functional sites.
000105065 700__ $$aYoon, Sungroh
000105065 700__ $$aEbert, Jessica C.
000105065 700__ $$aChung, Eui-Young
000105065 700__ $$0240269$$g167918$$aDe Micheli, Giovanni
000105065 700__ $$aAltman, Russ B.
000105065 773__ $$j8$$tBMC Bioinformatics$$kSuppl. 4$$qS10
000105065 8564_ $$uhttps://infoscience.epfl.ch/record/105065/files/1471-2105-8-S4-S10.pdf$$zn/a$$s662297$$yn/a
000105065 909C0 $$xU11140$$0252283$$pLSI1
000105065 909CO $$particle$$qGLOBAL_SET$$ooai:infoscience.epfl.ch:105065$$pSTI$$pIC
000105065 917Z8 $$x148230
000105065 917Z8 $$x112915
000105065 937__ $$aEPFL-ARTICLE-105065
000105065 973__ $$rREVIEWED$$sPUBLISHED$$aEPFL
000105065 980__ $$aARTICLE