Infoscience

Student project

Development of an interactive genome browser to visualize and analyse large scale genomic data

Genomic bioinformatics is a growing and developing field. Indeed, data analysis is becoming an integrative and essential part of any quantitative biological experiment as the technologies evolve and the wet lab methods used generate larger and larger quantities of data. Yet few standards have emerged and a plethora of analytical tools exist, none of which are established as a standard. The difficulties arise early on, even before processing any genomic data, as one first needs to visualize it. Several visualization methods exist, such as the UCSC genome browser, IGB or Argo, but none offer a satisfying interface or set of tools. Stemming from a pre-existing project at the bioinformatics and biostatistics core facility, this study presents a new solution to the multiple difficulties that at present beleaguer the field. A novel genome visualization tool is proposed where the user interface remains simple and incorporates a set of common statistical analysis functions. The software produced, entitled gFeatMiner, is capable of processing large scale genomic datasets for computing descriptive statistics and manipulate them in several ways. The program makes use of modern technologies and infrastructure paving the way for its development into an advanced data mining tool. In the second part of this study, a practical application is worked out. Examining the genes coding for ribosomal proteins in the model organism yeast (Saccharomyces cerevisiae) and using several available sets of data including multiple transcription factor binding profiles in vivo and in vitro, RNA polymerase activity and nucleosome enrichment, we attempt to better understand and reveal cellular mechanisms by clustering the numerous genes together using different criteria and machine learning strategies

Related material